← all repositories

quickwit-oss/whichlang

A Rust library that uses multiclass logistic regression over n-grams to detect language in text at high throughput.

447 stars Rust Data Tooling
whichlang
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This library provides fast, lightweight language detection supporting 16 languages including Arabic, Chinese, Hindi, Japanese, Korean, and European languages. It uses a hashing trick to project n-gram features (2, 3, 4-grams of letters and codepoint features) into a space of size 4,096, with the logistic regression model trained in Python and the weights generated as Rust code. The project was created to serve high-throughput text processing needs for the Quickwit search engine.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.