Ghost Pepper: The Open-Source Transcription App Competing With $80M Startups

Staff Writer

A free, privacy-audited macOS utility runs Whisper and small language models entirely on Apple Silicon, turning your voice into text without sending a byte to the cloud.

matthartman/ghost-pepper

★3k stars Velocity · 7d +8.7 ★/day ↘cooling

star history

View on GitHub ↗

The Heat Is Local

In 2007, Guinness World Records certified the ghost pepper as the world’s hottest chili, a superhot Capsicum chinense measuring over one million Scoville Heat Units—more than 170 times hotter than Tabasco sauce, with a slow-building burn that lingers for up to half an hour. The macOS utility Ghost Pepper borrows the name for a different kind of intensity. Released by Matt Hartman as a free, open-source project, it performs speech-to-text and meeting transcription entirely on Apple Silicon Macs. No cloud APIs. No subscription. No data leaves the machine. The README offers the rationale with a smirk: it is spicy to offer something for free that other apps have raised $80M to build.

That line is more than a pun. It captures a genuine inflection point in the AI utility market. For years, voice transcription has followed the standard SaaS playbook: capture audio, stream it to a remote cluster, and return text behind a paywall. Commercial tools like Transcribe, Jamie, and the venture-backed ecosystem around Otter.ai have built formidable platforms on this model, offering speaker separation, calendar integration, and team workspaces. But they also require trust. The W3C’s Web Speech API explainer explicitly warns that voice is personally identifiable information capable of revealing gender, ethnic origin, and health conditions. Ghost Pepper’s response is not a better privacy policy; it is architectural elimination. If the audio never traverses a network interface, it cannot be intercepted, subpoenaed, or monetized by a third party.

Architecture by Subtraction

Ghost Pepper is not a research breakthrough. It is a native Swift application that orchestrates existing open-source inference engines into a coherent, privacy-first utility. Speech recognition is handled by WhisperKit, Argmax’s Core ML-optimized port of OpenAI’s Whisper models. Post-processing—scrubbing filler words, resolving self-corrections, and summarizing meetings—runs through LLM.swift, a lightweight local LLM framework. Audio capture uses Apple’s AVAudioEngine and ScreenCaptureKit; optical character recognition relies on the Apple Vision framework. The application is, at its core, integration code. But it is well-engineered integration that spares users from managing Python environments, CUDA drivers, or terminal-based model downloads.

The model selection reveals a pragmatic tiering strategy. For English, the default Whisper small.en occupies roughly 466 MB and balances accuracy against speed. Multilingual users can select Whisper small, Parakeet v3 via FluidAudio at roughly 1.4 GB for 25 languages, or the newer Qwen3-ASR 0.6B quantized model at roughly 900 MB for more than 50 languages on macOS 15 and later. Cleanup is handled by Qwen 3.5 models at 0.8B, 2B, or 4B parameters; the default 0.8B variant completes its pass in one to two seconds, while the 4B model trades five to seven seconds of latency for fuller quality. All models download once from Hugging Face and cache locally. The entire stack is feasible because Apple Silicon’s unified memory architecture eliminates the PCIe latency that plagues discrete GPU setups, allowing these weights to reside beside the application code with minimal transfer overhead.

The user interface follows the same minimalist logic. Ghost Pepper lives in the menu bar, deliberately avoiding a dock icon. Activation is a physical gesture—hold the Control key, speak, release—and transcription is pasted via simulated keystrokes into whatever text field is active. Meeting recordings are chunked, transcribed, and saved as Markdown files on the local filesystem. There is no web dashboard, no proprietary format, and no cross-device sync. The app treats transcription as infrastructure rather than a platform, which aligns with the argument in recent comparative reviews that workflow fit and developer philosophy are now the primary differentiators as model accuracy and speed plateau across the industry.

A Privacy Audit You Can Read

Most applications bury their privacy posture behind a lawyerly HTML page. Ghost Pepper attempts to make its claims falsifiable. The repository includes a PRIVACY_AUDIT.md file that maps every subsystem—speech-to-text, text cleanup, audio recording, meeting storage, summary generation, OCR, file storage, and analytics—to a local-only verdict. The audit explicitly states that no Firebase, Mixpanel, Sentry, or tracking SDK is present. Usage metrics are reduced to local counters stored in UserDefaults, powering an in-app report panel rather than a remote dashboard. Users are invited to verify the findings themselves by pointing an AI code review tool at the repository.

Optional cloud integrations exist, but they are disabled by default and require the user to supply their own API keys for Zo AI chat, Trello, or Granola meeting import. This design creates a hard boundary: the core transcription pipeline is air-gapped by default, and any network egress is opt-in and user-authenticated. The approach aligns with the enterprise on-device movement. Google Cloud’s Speech On-Device service, which reached general availability in 2022, keeps data local for automotive and healthcare environments; Toyota is an early adopter. Picovoice markets on-device recognition as a regulation-compliant alternative that does not require HIPAA Business Associate Agreements. Ghost Pepper brings that same ethos to individual knowledge workers, but without the enterprise licensing overhead or binary delivery restrictions.

The Competitive Landscape and the $80M Question

Ghost Pepper enters a market crowded with well-funded alternatives. Transcribe, a commercially polished app with an App Store rating of 4.5 from roughly 11,000 reviews, offers live transcription in over 100 languages with rich export formats and device sync, but its free tier is capped at thirty minutes and its Pro tier is a recurring subscription. Jamie records meetings locally and generates AI notes, yet it still operates within a workspace model with controlled sharing and cloud-adjacent integrations. Superwhisper, a favorite among Mac power users, offers deep customization and a lifetime license, though its AI post-processing typically routes through cloud LLMs from OpenAI or Anthropic, and the upfront cost is significant enough that its most dedicated users write documentation and build automation utilities to justify the purchase.

Ghost Pepper undercuts all of them on price—it is MIT-licensed and free—and beats them on data residency by simply refusing to network the audio stream. What it sacrifices is the ecosystem. There is no speaker diarization comparable to cloud-native rivals, no mobile companion app, and no team collaboration layer. The README acknowledges its hardware constraints: macOS 14.0 or later, Apple Silicon only, and Accessibility permissions that may require MDM pre-approval on corporate machines via a Privacy Preferences Policy Control payload. For users who need Zoom bots, cross-device continuity, or advanced video editing, Ghost Pepper is not a replacement. For users who want to dictate an email or transcribe a standup without creating a data trail, it is a narrow, surgical tool.

Outlook: The Local Edge

The broader technology trends favor Ghost Pepper’s model. Google Cloud has demonstrated that server-quality conformer models can shrink to a few hundred megabytes and run on a single ARM core with real-time latency, and its cloud STT API already processes over one billion minutes of speech monthly. The W3C is actively proposing a processLocally flag for the Web Speech API so that browsers can request on-device recognition without cloud fallback, citing data residency, video conferencing accuracy, and offline educational use. Apple is reportedly opening new speech engines to third-party developers, and local LLMs are advancing rapidly. The technical barriers to consumer-grade, offline voice AI are falling.

Ghost Pepper may not need to outrun these corporate initiatives. Its value is as a baseline and a provocation. A solo developer—who also invests in technical founders for a living—has assembled a utility that rivals the core function of venture-backed transcription startups using nothing more than open weights, Apple’s silicon, and a strict privacy architecture. Whether the project gains a maintainer community or remains a single-author tool, it demonstrates that the local-first movement is no longer confined to note-taking apps and databases. It now includes the entire speech pipeline.

The heat, in this case, is not merely metaphorical. It is the friction that Ghost Pepper removes from private computing, and the lingering discomfort it creates for competitors who have built business models on renting access to your own voice.