Is kaldi-gstreamer-server open source?

Yes — alumae/kaldi-gstreamer-server is open source, released under the BSD-2-Clause license.

What language is kaldi-gstreamer-server written in?

alumae/kaldi-gstreamer-server is primarily written in Python.

How popular is kaldi-gstreamer-server?

alumae/kaldi-gstreamer-server has 1.1k stars on GitHub.

Where can I find kaldi-gstreamer-server?

alumae/kaldi-gstreamer-server is on GitHub at https://github.com/alumae/kaldi-gstreamer-server.

← all repositories

alumae/kaldi-gstreamer-server

Kaldi speech recognition, served over WebSockets with a side of GStreamer

A real-time speech-to-text server that streams partial transcripts as you talk, built for scaling out rather than up.

★1.1k stars Python Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is a Python server that takes live audio streams over WebSockets and returns speech recognition results as they arrive—partial hypotheses first, final text later. It wraps the Kaldi speech recognition toolkit inside GStreamer’s media pipeline, then splits the work across a master process and independent worker processes that can live on separate machines.

The interesting bit

The architecture is deliberately old-school scalable: one worker per active recognition session, add more workers anywhere to handle more concurrent users. No GPU clustering magic, just Unix processes and WebSockets. It also persists acoustic model adaptation state between requests, so repeat users theoretically get better recognition over time.

Key highlights

Supports both legacy GMM and newer DNN/i-vector models (nnet2/nnet3), with the DNN path requiring a separate plugin compile
Handles arbitrarily long audio via silence-based segmentation
Can rescore recognition lattices with larger language models for better accuracy
Post-processing hooks let you rewrite results through external programs (e.g., words-to-numbers conversion)
Sample clients in Python, Java, JavaScript, and Haskell; includes English and Estonian demo models

Caveats

Requires Python 2.7, Tornado 4.x, and a specific ws4py version (0.3.2) due to a reported bug in 0.3.5
The postprocessing mechanism breaks with Tornado 5+; changelog recommends pinning to Tornado 4.5.3
Building Kaldi and its GStreamer plugins is “quite complicated”; Docker image exists but is community-maintained
nnet3 support added in 2016, noted as “not tested very carefully”

Verdict

Worth a look if you need self-hosted, real-time speech recognition with explicit control over acoustic models and scaling logic. Skip it if you want managed APIs, modern Python, or whisper.cpp-style simplicity—the dependency stack here is substantial and showing its age.

Frequently asked

What is alumae/kaldi-gstreamer-server?: A real-time speech-to-text server that streams partial transcripts as you talk, built for scaling out rather than up.
Is kaldi-gstreamer-server open source?: Yes — alumae/kaldi-gstreamer-server is open source, released under the BSD-2-Clause license.
What language is kaldi-gstreamer-server written in?: alumae/kaldi-gstreamer-server is primarily written in Python.
How popular is kaldi-gstreamer-server?: alumae/kaldi-gstreamer-server has 1.1k stars on GitHub.
Where can I find kaldi-gstreamer-server?: alumae/kaldi-gstreamer-server is on GitHub at https://github.com/alumae/kaldi-gstreamer-server.