Speech recognition that doesn't phone home
An offline server that wraps Vosk/Kaldi in four protocols so you can pick your poison.

What it does
vosk-server is a thin server layer around the Vosk-API and Kaldi speech recognition engines. It exposes the same offline ASR backend through four protocols: WebSocket, gRPC, WebRTC, and MQTT. You run it locally or on a server, point your client at it, and get transcripts without shipping audio to cloud APIs.
The interesting bit
The project is essentially protocol glue — but useful glue. The WebRTC path is the unusual one; most open-source ASR tools stop at HTTP or gRPC, leaving browser-based real-time audio as an exercise for the reader. Here it’s built in, which matters for telephony and web chatbots where latency stings.
Key highlights
- Four protocol servers in one repo: WebSocket, gRPC, WebRTC, MQTT
- Targets specific integration patterns: smart home, PBX (FreeSWITCH, Asterisk), web backends, chatbots
- Fully offline — runs the Vosk/Kaldi stack locally, no external API calls
- Docker-based deployment; docs live on the separate Vosk website
Caveats
- The README is sparse; actual setup instructions are off-repo at alphacephei.com/vosk/server
- No benchmarks, model sizes, or hardware requirements listed in the repository itself
Verdict
Worth a look if you’re building voice features into a web app or phone system and would rather not feed Google or AWS your audio stream. Skip it if you need managed scaling, detailed telemetry, or extensive in-repo documentation.