← all repositories
ggml-org/whisper.cpp

Whisper on a diet: 50K stars for C++ speech recognition

A dependency-free C/C++ port of OpenAI's Whisper that runs everywhere from iPhones to Raspberry Pis without calling home.

whisper.cpp
Velocity · 7d
+37
★ / day
Trend
steady
star history

What it does whisper.cpp is a plain C/C++ reimplementation of OpenAI’s Whisper automatic speech recognition model. It transcribes speech to text without dragging in Python, PyTorch, or CUDA toolkits. The core model fits in two files (whisper.h and whisper.cpp); everything else delegates to the companion ggml tensor library.

The interesting bit The project treats Apple Silicon as a first-class citizen—ARM NEON, Accelerate, Metal, and Core ML are all wired in—yet it also ships VSX intrinsics for IBM POWER9/10 and builds for WebAssembly. That’s an unusually broad hardware church for a single inference engine. The “zero runtime memory allocations” claim is the kind of boring-sounding constraint that actually matters if you’re embedding this on a phone or embedded device.

Key highlights

  • No external dependencies; builds with just CMake and a C++ compiler
  • GPU backends: Metal (Apple), Vulkan, NVIDIA, AMD ROCm, OpenVINO, plus Ascend and Moore Threads NPUs/GPUs
  • Integer quantization (Q5_0, etc.) to shrink models and memory footprint
  • Memory usage ranges from ~273 MB (tiny) to ~3.9 GB (large) at runtime
  • Bindings and examples for iOS, Android, Java, and browser via WASM
  • Voice Activity Detection (VAD) support for streaming scenarios

Caveats

  • The CLI example only accepts 16-bit WAV; you’ll need ffmpeg to convert MP3s or other formats
  • Core ML and OpenVINO require extra Python tooling and model conversion steps—not plug-and-play
  • First runs on Apple Neural Engine or OpenVINO devices incur compilation overhead before caching kicks in

Verdict Worth a look if you need offline speech recognition in a resource-constrained or non-Python environment. Skip it if you’re already happy with OpenAI’s Python stack and don’t care about binary size or dependency hygiene.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.