PHP language detection without calling Google Translate
A self-contained n-gram library that trains on 110 languages and runs entirely offline.

What it does
Feed it a string of text, get back ranked language guesses with confidence scores. It ships with pre-trained models for 110 languages and a Trainer class to roll your own—whether that’s Klingon, spam vs. ham, or something more practical.
The interesting bit
The library compiles n-gram frequency data into plain PHP arrays rather than JSON (since v4), which is a blunt-force but effective way to dodge parse overhead. You can also cap the n-gram count to trade accuracy for speed, or whitelist specific languages to skip comparisons you don’t need.
Key highlights
- 110 built-in languages, with trainable support for custom ones
- Method chaining:
detect()->blacklist('de')->limit(3)->close() ArrayAccesslets you pluck scores like$result['nl']- Custom tokenizers via
TokenizerInterfacefor domain-specific text - Requires PHP ≥ 7.4 and the
mbstringextension
Caveats
- Needs “some sentences” for reliable detection; short strings are dicey
- Training with large n-gram counts (the README suggests ~9,000 for better accuracy) is slow, though detection speed stays flat
- Upgrading from v3 requires regenerating custom training files from JSON to PHP
Verdict
Worth a look if you’re building a PHP app that needs offline language detection without pulling in heavy ML dependencies. Skip it if you’re already running Python or need real-time detection on single words.