quickwit-oss/whichlang
A Rust library that uses multiclass logistic regression over n-grams to detect language in text at high throughput.

This library provides fast, lightweight language detection supporting 16 languages including Arabic, Chinese, Hindi, Japanese, Korean, and European languages. It uses a hashing trick to project n-gram features (2, 3, 4-grams of letters and codepoint features) into a space of size 4,096, with the logistic regression model trained in Python and the weights generated as Rust code. The project was created to serve high-throughput text processing needs for the Quickwit search engine.