← all repositories
syl22-00/pocketsphinx.js

Speech recognition that never phones home

PocketSphinx.js compiles a C speech recognizer to WebAssembly so your browser can transcribe audio without sending it to anyone else's server.

1.5k stars JavaScript Image · Video · Audio
pocketsphinx.js
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does PocketSphinx.js is a browser-based speech recognizer built by compiling the C library PocketSphinx to JavaScript or WebAssembly via Emscripten. It includes an audio recorder using the Web Audio API, a Web Worker wrapper to keep recognition off the UI thread, and a callback utility for cleaner worker communication. The whole pipeline runs locally—no cloud APIs, no network latency, no data leaving the machine.

The interesting bit The project treats the browser as a full compilation target, not just a JavaScript runtime. You can embed acoustic models, language models, and dictionaries directly into the build output via CMake flags, or split them out to avoid a multi-megabyte initial download. There’s even a separate Chinese demo and keyword spotting support for wake-word-style detection.

Key highlights

  • Compiles to either asm.js or WebAssembly; WebAssembly build requires correct MIME type serving (application/wasm)
  • recognizer.js wraps the heavy lifting in a Web Worker so the main thread stays responsive
  • audioRecorder.js handles sample-rate conversion and can be reused for non-speech audio applications
  • Supports custom acoustic models, statistical language models, and dictionaries at build time or runtime
  • Includes live demos for English, Chinese, and keyword spotting

Caveats

  • The compiled output is “a few MB” and loads synchronously, so the Web Worker wrapper is essentially mandatory for production use
  • Build process requires Emscripten, CMake, and careful submodule initialization; Windows users get sent to the Emscripten manual
  • README warns that you must serve over HTTPS or localhost for audio recording to work, and suggests running Chrome with --disable-web-security for local testing—a security footgun if misunderstood

Verdict Worth a look if you need offline speech recognition in a web app and can tolerate the complexity of shipping your own models. Skip it if you want plug-and-play accuracy or modern neural-network-based recognition; this is classic HMM-GMM speech recognition, not Whisper.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.