← all repositories

IndoNLP/indonlu

A natural language understanding benchmark for Indonesian language featuring IndoBERT and IndoBERT-lite pre-trained models trained on 20GB of text.

646 stars Jupyter Notebook Language ModelsData Tooling
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

IndoNLU is a collection of NLU resources for Bahasa Indonesia containing 12 downstream tasks. It provides code to reproduce results and large pre-trained models including IndoBERT and IndoBERT-lite, trained on approximately 4 billion words from the Indo4B corpus (over 20GB of text data). The project serves as both a benchmark for evaluating Indonesian language understanding and a provider of ready-to-use Indonesian language models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.