Speech recognition from the 1970s that still fits in your pocket
A lightweight C library for offline speech recognition, recently unshackled from its ancient dependencies.

What it does
PocketSphinx takes single-channel 16-bit PCM audio and spits out text. It runs offline, handles live or batch input, and can force-align audio to known transcripts (down to phone and state level if you’re into that). There’s a command-line tool and C/Python APIs. The output is line-delimited JSON — “not the prettiest format, but it sure beats XML,” as the maintainer notes.
The interesting bit
The algorithms date to the 1970s, yet the project just had a spring cleaning: SphinxBase — its perennial companion dependency — has been fully absorbed and eliminated. “There is no SphinxBase anymore. This is not the SphinxBase you’re looking for.” The audio library, which “never really built or worked correctly on any platform at all,” was also mercifully killed. What’s left is a standalone CMake build with fewer moving parts.
Key highlights
- Offline recognition with compact, efficient models — no cloud required
- Force-alignment mode for timestamping words, phones, or HMM states in audio
soxflagscommand generates the right audio conversion arguments automatically- Python bindings installable via pip; C library via standard CMake
- Regression and unit tests built with
cmake --build build --target check
Caveats
- The README warns that recognition results “may not be wonderful” with default models
- macOS build status is uncertain (“I don’t have one of those”)
- Partial results in live mode aren’t implemented yet, and “don’t hold your breath”
- ReadTheDocs updates are still manual
Verdict
Good fit for embedded systems, Raspberry Pi projects, or anywhere you need speech recognition without network access or GPU muscle. Skip it if you need modern accuracy — this trades precision for portability and a small footprint.