← all repositories
kpu/kenlm

Language models that fit in RAM without the drama

A battle-tested C++ toolkit for building, shrinking, and querying n-gram language models when memory and latency both matter.

2.8k stars C++ Language Models
kenlm
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

KenLM is Kenneth Heafield’s toolkit for the full n-gram lifecycle: estimating unpruned models with modified Kneser-Ney smoothing (lmplz), filtering ARPA files to strip unneeded entries, and querying them fast. It speaks ARPA and a custom binary format, with Python bindings and mmap support for loading models without copying them into RAM.

The interesting bit

The speed/memory tradeoff is explicit and tunable. Two query backends coexist: a probing hash table (fastest, fattest) and a bit-packed trie (58% the memory of IRST’s smallest, 81% the CPU of IRST’s fastest). The trie packs word indices and pointers down to the minimum bit width. This is old-school systems programming applied to a very specific bottleneck—language model scoring in machine translation decoders.

Key highlights

  • On-disk estimation with user-specified memory limits, so you can train on text larger than RAM
  • Binary format with mmap support for near-instant model loading
  • Python module via Cython (pip install from GitHub zip)
  • Query-only builds possible without Boost; estimation still needs it
  • Explicitly designed to be vendored: copy the code into your decoder, but send patches upstream

Caveats

  • Unaligned reads in murmur_hash.cc and bit_packing.hh make it architecture-dependent; ARM support exists but is “reportedly working, at least on the iphone”
  • The README warns decoder developers to grab the latest version from the project website rather than copying from other decoders, suggesting drift in downstream copies
  • Build system multiplicity (cmake, compile.sh, bjam) may require some attention to get the feature flags you need

Verdict

Worth a look if you’re building or maintaining a decoder, speech pipeline, or anything else that needs fast n-gram scoring with tight memory constraints. Skip it if you’re already happy with a neural language model and don’t need the efficiency or interpretability of n-grams.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.