Is indonlu open source?

Yes — IndoNLP/indonlu is open source, released under the Apache-2.0 license.

What language is indonlu written in?

IndoNLP/indonlu is primarily written in Jupyter Notebook.

How popular is indonlu?

IndoNLP/indonlu has 652 stars on GitHub.

Where can I find indonlu?

IndoNLP/indonlu is on GitHub at https://github.com/IndoNLP/indonlu.

← all repositories

IndoNLP/indonlu

12 tasks, 4 billion words, one Indonesian NLP benchmark

It exists because Indonesian NLP needed a standard place to compare BERT-family models across a dozen real tasks.

★652 stars Jupyter Notebook Language Models Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does IndoNLU is a benchmark suite and resource bundle for Bahasa Indonesia. It packages twelve downstream NLU tasks with train, validation, and masked test splits, plus eight pre-trained IndoBERT and IndoBERT-lite models trained on the 23 GB Indo4B corpus. A consortium of Indonesian universities and industry labs built it to give the field a common evaluation floor rather than a collection of one-off experiments.

The interesting bit The test answers are deliberately hidden; you submit predictions to a CodaLab portal to get a score on the public leaderboard. That design choice turns the repo into a living competition as much as a dataset drop, which explains why a consortium of universities and industry partners had to build it: the language simply lacked a shared benchmark.

Key highlights

Twelve downstream tasks covering sequence classification and tagging, with data splits maintained by the authors
IndoBERT and IndoBERT-lite model zoo (eight variants) hosted on Hugging Face, pre-trained on roughly four billion words
Indo4B pretraining corpus: 23 GB uncompressed Indonesian text, though Twitter content is excluded per developer policy
Joint academic-industry effort involving Institut Teknologi Bandung, HKUST, Gojek, and Prosa.AI
AACL-IJCNLP 2020 publication with a public leaderboard and CodaLab submission portal

Caveats

Test labels are masked, so local benchmarking is impossible; you must submit to the CodaLab portal for scores.
The model zoo is strictly BERT-based; the README does not mention newer architectures.

Verdict Worth bookmarking if you work on Indonesian text classification or tagging and need a rigorous baseline. Skip it if you need generative models or modern encoder-decoder architectures; this is a BERT-family benchmark from 2020.

Frequently asked

What is IndoNLP/indonlu?: It exists because Indonesian NLP needed a standard place to compare BERT-family models across a dozen real tasks.
Is indonlu open source?: Yes — IndoNLP/indonlu is open source, released under the Apache-2.0 license.
What language is indonlu written in?: IndoNLP/indonlu is primarily written in Jupyter Notebook.
How popular is indonlu?: IndoNLP/indonlu has 652 stars on GitHub.
Where can I find indonlu?: IndoNLP/indonlu is on GitHub at https://github.com/IndoNLP/indonlu.