Speech recognition that never phones home
PocketSphinx.js compiles a C speech recognizer to WebAssembly so your browser can transcribe audio without sending it to anyone else's server.

What it does PocketSphinx.js is a browser-based speech recognizer built by compiling the C library PocketSphinx to JavaScript or WebAssembly via Emscripten. It includes an audio recorder using the Web Audio API, a Web Worker wrapper to keep recognition off the UI thread, and a callback utility for cleaner worker communication. The whole pipeline runs locally—no cloud APIs, no network latency, no data leaving the machine.
The interesting bit The project treats the browser as a full compilation target, not just a JavaScript runtime. You can embed acoustic models, language models, and dictionaries directly into the build output via CMake flags, or split them out to avoid a multi-megabyte initial download. There’s even a separate Chinese demo and keyword spotting support for wake-word-style detection.
Key highlights
- Compiles to either asm.js or WebAssembly; WebAssembly build requires correct MIME type serving (
application/wasm) recognizer.jswraps the heavy lifting in a Web Worker so the main thread stays responsiveaudioRecorder.jshandles sample-rate conversion and can be reused for non-speech audio applications- Supports custom acoustic models, statistical language models, and dictionaries at build time or runtime
- Includes live demos for English, Chinese, and keyword spotting
Caveats
- The compiled output is “a few MB” and loads synchronously, so the Web Worker wrapper is essentially mandatory for production use
- Build process requires Emscripten, CMake, and careful submodule initialization; Windows users get sent to the Emscripten manual
- README warns that you must serve over HTTPS or localhost for audio recording to work, and suggests running Chrome with
--disable-web-securityfor local testing—a security footgun if misunderstood
Verdict Worth a look if you need offline speech recognition in a web app and can tolerate the complexity of shipping your own models. Skip it if you want plug-and-play accuracy or modern neural-network-based recognition; this is classic HMM-GMM speech recognition, not Whisper.