Is polyglot open source?

Yes — aboSamoor/polyglot is an open-source project tracked on heatdrop.

What language is polyglot written in?

aboSamoor/polyglot is primarily written in Python.

How popular is polyglot?

aboSamoor/polyglot has 2.4k stars on GitHub.

Where can I find polyglot?

aboSamoor/polyglot is on GitHub at https://github.com/aboSamoor/polyglot.

← all repositories

aboSamoor/polyglot

NLP for the other 6.5 billion people

A Python toolkit that treats 165-language tokenization as table stakes, not a stretch goal.

★2.4k stars Python Language Models ML Frameworks

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Polyglot is a Python NLP pipeline built for breadth over depth. It detects 196 languages, tokenizes 165 of them, and runs sentiment analysis, embeddings, morphology, and transliteration across triple-digit language counts. The API is deliberately simple: wrap a string in Text() or Word() and call methods like .pos_tags, .entities, or .polarity.

The interesting bit

The project inverts the usual NLP hierarchy. English gets 16 languages’ worth of POS tagging; everyone else gets tokenization and embeddings at minimum. The README’s German NER example outputs raw I-LOC and I-PER tags with escaped Unicode—no polish, just proof it works. That’s the aesthetic: coverage first, refinement later.

Key highlights

196-language detection, 165-language tokenization, 137-language word embeddings
Single-object API: Text(string).words, Word(string, language="en").neighbors
Morphological decomposition (“Preprocessing” → ['Pre', 'process', 'ing'])
Cyrillic transliteration: English “preprocessing” becomes препрокессинг
GPLv3 licensed, Travis CI + ReadTheDocs infrastructure

Caveats

POS tagging only covers 16 languages; coverage is uneven across features
Last significant README activity appears pre-2017 (Travis CI badge, Python 2 u"" strings in examples)
No candidate images or screenshots provided in repository

Verdict

Grab this if you’re prototyping multilingual pipelines and need broad language detection or transliteration without training custom models. Skip it if you need state-of-the-art accuracy on English-only tasks—spaCy or Stanza have overtaken it there.

Frequently asked

What is aboSamoor/polyglot?: A Python toolkit that treats 165-language tokenization as table stakes, not a stretch goal.
Is polyglot open source?: Yes — aboSamoor/polyglot is an open-source project tracked on heatdrop.
What language is polyglot written in?: aboSamoor/polyglot is primarily written in Python.
How popular is polyglot?: aboSamoor/polyglot has 2.4k stars on GitHub.
Where can I find polyglot?: aboSamoor/polyglot is on GitHub at https://github.com/aboSamoor/polyglot.