Language models that fit in RAM without the drama
A battle-tested C++ toolkit for building, shrinking, and querying n-gram language models when memory and latency both matter.

What it does
KenLM is Kenneth Heafield’s toolkit for the full n-gram lifecycle: estimating unpruned models with modified Kneser-Ney smoothing (lmplz), filtering ARPA files to strip unneeded entries, and querying them fast. It speaks ARPA and a custom binary format, with Python bindings and mmap support for loading models without copying them into RAM.
The interesting bit
The speed/memory tradeoff is explicit and tunable. Two query backends coexist: a probing hash table (fastest, fattest) and a bit-packed trie (58% the memory of IRST’s smallest, 81% the CPU of IRST’s fastest). The trie packs word indices and pointers down to the minimum bit width. This is old-school systems programming applied to a very specific bottleneck—language model scoring in machine translation decoders.
Key highlights
- On-disk estimation with user-specified memory limits, so you can train on text larger than RAM
- Binary format with
mmapsupport for near-instant model loading - Python module via Cython (
pip installfrom GitHub zip) - Query-only builds possible without Boost; estimation still needs it
- Explicitly designed to be vendored: copy the code into your decoder, but send patches upstream
Caveats
- Unaligned reads in
murmur_hash.ccandbit_packing.hhmake it architecture-dependent; ARM support exists but is “reportedly working, at least on the iphone” - The README warns decoder developers to grab the latest version from the project website rather than copying from other decoders, suggesting drift in downstream copies
- Build system multiplicity (cmake, compile.sh, bjam) may require some attention to get the feature flags you need
Verdict
Worth a look if you’re building or maintaining a decoder, speech pipeline, or anything else that needs fast n-gram scoring with tight memory constraints. Skip it if you’re already happy with a neural language model and don’t need the efficiency or interpretability of n-grams.