Is kenlm open source?

Yes — kpu/kenlm is an open-source project tracked on heatdrop.

What language is kenlm written in?

kpu/kenlm is primarily written in C++.

How popular is kenlm?

kpu/kenlm has 2.8k stars on GitHub.

Where can I find kenlm?

kpu/kenlm is on GitHub at https://github.com/kpu/kenlm.

← all repositories

kpu/kenlm

Language models that fit in RAM without the drama

A battle-tested C++ toolkit for building, shrinking, and querying n-gram language models when memory and latency both matter.

★2.8k stars C++ Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

KenLM is Kenneth Heafield’s toolkit for the full n-gram lifecycle: estimating unpruned models with modified Kneser-Ney smoothing (lmplz), filtering ARPA files to strip unneeded entries, and querying them fast. It speaks ARPA and a custom binary format, with Python bindings and mmap support for loading models without copying them into RAM.

The interesting bit

The speed/memory tradeoff is explicit and tunable. Two query backends coexist: a probing hash table (fastest, fattest) and a bit-packed trie (58% the memory of IRST’s smallest, 81% the CPU of IRST’s fastest). The trie packs word indices and pointers down to the minimum bit width. This is old-school systems programming applied to a very specific bottleneck—language model scoring in machine translation decoders.

Key highlights

On-disk estimation with user-specified memory limits, so you can train on text larger than RAM
Binary format with mmap support for near-instant model loading
Python module via Cython (pip install from GitHub zip)
Query-only builds possible without Boost; estimation still needs it
Explicitly designed to be vendored: copy the code into your decoder, but send patches upstream

Caveats

Unaligned reads in murmur_hash.cc and bit_packing.hh make it architecture-dependent; ARM support exists but is “reportedly working, at least on the iphone”
The README warns decoder developers to grab the latest version from the project website rather than copying from other decoders, suggesting drift in downstream copies
Build system multiplicity (cmake, compile.sh, bjam) may require some attention to get the feature flags you need

Verdict

Worth a look if you’re building or maintaining a decoder, speech pipeline, or anything else that needs fast n-gram scoring with tight memory constraints. Skip it if you’re already happy with a neural language model and don’t need the efficiency or interpretability of n-grams.

Frequently asked

What is kpu/kenlm?: A battle-tested C++ toolkit for building, shrinking, and querying n-gram language models when memory and latency both matter.
Is kenlm open source?: Yes — kpu/kenlm is an open-source project tracked on heatdrop.
What language is kenlm written in?: kpu/kenlm is primarily written in C++.
How popular is kenlm?: kpu/kenlm has 2.8k stars on GitHub.
Where can I find kenlm?: kpu/kenlm is on GitHub at https://github.com/kpu/kenlm.