Is indic_nlp_library open source?

Yes — anoopkunchukuttan/indic_nlp_library is open source, released under the MIT license.

What language is indic_nlp_library written in?

anoopkunchukuttan/indic_nlp_library is primarily written in Python.

How popular is indic_nlp_library?

anoopkunchukuttan/indic_nlp_library has 639 stars on GitHub.

Where can I find indic_nlp_library?

anoopkunchukuttan/indic_nlp_library is on GitHub at https://github.com/anoopkunchukuttan/indic_nlp_library.

← all repositories

anoopkunchukuttan/indic_nlp_library

One toolbox for 22 official languages

A Python library that treats Hindi, Tamil, Bengali, and friends as a family rather than isolated problems.

★639 stars Python Language Models Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does The Indic NLP Library handles bread-and-butter text processing for Indian languages: normalization, tokenization, sentence splitting, word segmentation, syllabification, and script conversion including romanization and its reverse (“indicization”). It also exposes a unified command-line interface alongside its Python API.

The interesting bit The core insight is that Indian languages share enough DNA—scripts derived from Brahmi, similar phonology, overlapping syntax—to make a generalised toolkit feasible. Rather than building 22 separate pipelines, you get one library that exploits those commonalities. The author has since moved on to neural models at AI4Bharat, but this remains the pragmatic baseline.

Key highlights

Covers text normalization through script conversion in a single API
Command-line wrapper for quick shell workflows
Resources (models, data files) live in a separate repo: indic_nlp_resources
Used by Microsoft NLP Recipes, Facebook’s M2M-100, and CLTK
MIT licensed since 2019

Caveats

Translation and transliteration APIs were dropped; users are pointed to newer AI4Bharat models instead
Requires manual environment setup (INDIC_RESOURCES_PATH) even for pip installs
Urdu normalization pulls in TensorFlow via Urduhack, which is a heavy dependency for one language

Verdict Worth a look if you’re building Indian-language pipelines and need battle-tested preprocessing without reaching for heavyweight neural models. Skip if you need end-to-end translation or state-of-the-art transliteration—those have migrated to AI4Bharat’s newer tools.

Frequently asked

What is anoopkunchukuttan/indic_nlp_library?: A Python library that treats Hindi, Tamil, Bengali, and friends as a family rather than isolated problems.
Is indic_nlp_library open source?: Yes — anoopkunchukuttan/indic_nlp_library is open source, released under the MIT license.
What language is indic_nlp_library written in?: anoopkunchukuttan/indic_nlp_library is primarily written in Python.
How popular is indic_nlp_library?: anoopkunchukuttan/indic_nlp_library has 639 stars on GitHub.
Where can I find indic_nlp_library?: anoopkunchukuttan/indic_nlp_library is on GitHub at https://github.com/anoopkunchukuttan/indic_nlp_library.