← all repositories
ccoreilly/vosk-browser

Speech recognition that stays off the server

Vosk-browser runs Kaldi-based ASR in a WebWorker via WebAssembly, so your voice data never leaves the tab.

517 stars JavaScript Image · Video · Audio
vosk-browser
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Vosk-browser wraps the Vosk speech recognition engine in a WebAssembly build purpose-built for browser WebWorkers. You load a model (a .tar.gz file), wire it to microphone input through the Web Audio API, and get text back — all without a network round-trip to a speech API.

The interesting bit

The heavy lifting isn’t new; Denis Treskunov did the original WebAssembly port of Kaldi. This project packages an updated Vosk build and, crucially, makes the browser integration ergonomic. The WebWorker constraint matters: it keeps the main thread free while the recognizer chews on audio, which is the only sane way to do real-time ASR in a browser without jank.

Key highlights

  • 13 languages supported in the live demo
  • CDN loadable via jsdelivr as a global Vosk object
  • API surface is small: create model, instantiate KaldiRecognizer, feed it AudioBuffer chunks via acceptWaveform()
  • Explicitly browser-only; NodeJS users are directed to official bindings
  • Examples and API docs live in ./lib/README.md and ./examples/

Caveats

  • No tests yet (listed in Todos)
  • Model files are your problem to host and serve; no cloud model delivery
  • Speaker identification models are not yet demonstrated

Verdict

Worth a look if you need offline, privacy-preserving speech-to-text in a web app and can stomach shipping multi-megabyte model files. If you want turnkey cloud accuracy or NodeJS server-side ASR, this isn’t your tool.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.