A field guide to Persian NLP, because nobody should parse Farsi alone
A curated index of tools, datasets, and papers for Persian-language NLP and IR research.

What it does This is an “awesome list” — a community-curated index of Persian NLP and information retrieval resources. It catalogs tools, datasets, models, code repositories, and academic papers in five flat sections. Think of it as a card catalog for a library that happens to be written right-to-left.
The interesting bit Persian NLP sits in a resource gap: too niche for most multilingual toolkits, too complex for off-the-shelf English solutions. The list explicitly covers the long tail — morphological analyzers, shallow parsers, Persian-specific stemmers, and language detection — rather than dumping generic BERT links.
Key highlights
- Five sections: Tools, Datasets, Models, Repositories, Papers and Books
- Covers specialized tasks: normalizers, dependency parsers, POS taggers, NER, spell checkers
- CC0 license — no attribution friction for reuse
- Accepts community contributions via documented guidelines
- 767 stars suggests active use by Persian NLP researchers
Caveats
- The actual content lives in separate markdown files (tools.md, datasets.md, etc.) — the README is just a skeleton; you have to click through to see what’s actually listed
- No indication of how frequently the list is maintained or when it was last updated
Verdict Worth bookmarking if you’re building or evaluating Persian-language NLP systems. Skip it if you need a searchable, filterable database — this is a hand-curated flat list, not a registry with metadata.