ggml-org/whisper.cpp

Whisper on a diet: 50K stars for C++ speech recognition

A dependency-free C/C++ port of OpenAI's Whisper that runs everywhere from iPhones to Raspberry Pis without calling home.

★50.5k stars C++ Image · Video · Audio Inference · Serving

View on GitHub ↗

Velocity · 7d

+37

★ / day

Trend

→steady

star history

What it does whisper.cpp is a plain C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition model. It transcribes speech to text without dragging in Python, PyTorch, or CUDA toolkits. The core model fits in two files (whisper.h and whisper.cpp); everything else delegates to the companion ggml tensor library.

The interesting bit The project treats Apple Silicon as a first-class citizen—ARM NEON, Accelerate, Metal, and Core ML are all wired in—yet it also ships VSX intrinsics for IBM POWER9/10 and builds for WebAssembly. That’s an unusually broad hardware church for a single inference engine. The “zero runtime memory allocations” claim is the kind of boring-sounding constraint that actually matters if you’re embedding this on a phone or embedded device.

Key highlights

No external dependencies; builds with just CMake and a C++ compiler
GPU backends: Metal (Apple), Vulkan, NVIDIA, AMD ROCm, OpenVINO, plus Ascend and Moore Threads NPUs/GPUs
Integer quantization (Q5_0, etc.) to shrink models and memory footprint
Memory usage ranges from ~273 MB (tiny) to ~3.9 GB (large) at runtime
Bindings and examples for iOS, Android, Java, and browser via WASM
Voice Activity Detection (VAD) support for streaming scenarios

Caveats

The CLI example only accepts 16-bit WAV; you’ll need ffmpeg to convert MP3s or other formats
Core ML and OpenVINO require extra Python tooling and model conversion steps—not plug-and-play
First runs on Apple Neural Engine or OpenVINO devices incur compilation overhead before caching kicks in

Verdict Worth a look if you need offline speech recognition in a resource-constrained or non-Python environment. Skip it if you’re already happy with OpenAI’s Python stack and don’t care about binary size or dependency hygiene.