← all repositories
mhbashari/awesome-persian-nlp-ir

A field guide to Persian NLP, because nobody should parse Farsi alone

A curated index of tools, datasets, and papers for Persian-language NLP and IR research.

awesome-persian-nlp-ir
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does This is an “awesome list” — a community-curated index of Persian NLP and information retrieval resources. It catalogs tools, datasets, models, code repositories, and academic papers in five flat sections. Think of it as a card catalog for a library that happens to be written right-to-left.

The interesting bit Persian NLP sits in a resource gap: too niche for most multilingual toolkits, too complex for off-the-shelf English solutions. The list explicitly covers the long tail — morphological analyzers, shallow parsers, Persian-specific stemmers, and language detection — rather than dumping generic BERT links.

Key highlights

  • Five sections: Tools, Datasets, Models, Repositories, Papers and Books
  • Covers specialized tasks: normalizers, dependency parsers, POS taggers, NER, spell checkers
  • CC0 license — no attribution friction for reuse
  • Accepts community contributions via documented guidelines
  • 767 stars suggests active use by Persian NLP researchers

Caveats

  • The actual content lives in separate markdown files (tools.md, datasets.md, etc.) — the README is just a skeleton; you have to click through to see what’s actually listed
  • No indication of how frequently the list is maintained or when it was last updated

Verdict Worth bookmarking if you’re building or evaluating Persian-language NLP systems. Skip it if you need a searchable, filterable database — this is a hand-curated flat list, not a registry with metadata.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.