← all repositories
julius-speech/julius

A speech recognizer that fits in 32MB of RAM

Julius is a decades-old C engine for real-time large-vocabulary speech recognition that still runs on microcontrollers.

1.9k stars C Image · Video · Audio
julius
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

Julius decodes continuous speech in real time using word N-gram language models and context-dependent HMMs. It handles 20k-word dictation in under 64MB of working memory, runs on Linux through Android, and accepts live microphone input, network streams, or pre-recorded audio. The output includes transcripts, phoneme alignments, confidence scores, and word graphs.

The interesting bit

The project dates to 1997 and still gets DNN support bolted on via a socket-separated front-end module—so you can swap in modern neural acoustic models without touching the core decoder. It also runs multiple recognition instances (dictation, grammar-based, isolated-word) in a single thread, which feels almost aggressively frugal by current standards.

Key highlights

  • 2-pass tree-trellis search with pruning, N-gram factoring, and Gaussian selection
  • Supports HTK acoustic models, ARPA language models, and DNN frame-wise probability input
  • Library API (Rev. 4) plus server mode and control API
  • Forced alignment at word, phoneme, and state level
  • BSD 3-clause license; Japanese and English model kits available

Caveats

  • Last release was 4.6 in September 2020; documentation updates are “work in progress”
  • English DNN models need a manual config tweak (cvnstatic, state_prior_log10nize false) to work with current Julius
  • The best-supported path is still Japanese dictation; English models live on SourceForge and a user fork

Verdict

Grab this if you need embeddable, low-footprint speech recognition with model format flexibility—especially for resource-constrained or research environments. Skip it if you want turnkey cloud-grade accuracy or active commercial support.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.