← all repositories
KLUE-benchmark/KLUE

Korean NLP finally gets its GLUE

A proper benchmark for Korean language models, because comparing models on vibes wasn't cutting it anymore.

KLUE
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

KLUE is an 8-task benchmark for Korean NLP—think GLUE or SuperGLUE, but for a language that existing benchmarks mostly ignore. It ships with datasets, evaluation metrics, fine-tuning recipes, and two pretrained models (KLUE-BERT and KLUE-RoBERTa) so you can actually reproduce baselines instead of guessing.

The interesting bit

The project was built with unusual care for a benchmark: explicit design principles around accessibility, annotation quality, and even AI ethics. The baseline table is refreshingly honest—KLUE’s own models don’t sweep every category, and you can see exactly where XLM-R-large still wins or where koELECTRA edges them out.

Key highlights

  • 8 tasks covering classification, similarity, inference, NER, relation extraction, dependency parsing, reading comprehension, and dialogue state tracking
  • Pretrained models on Hugging Face Hub in four sizes (including a deliberately small RoBERTa for resource-constrained work)
  • CC BY-SA 4.0 license—actually open, not “open” with a 47-page clickthrough
  • Active leaderboard with submission guidelines
  • Backed by a small consortium of Korean industry and academia (Upstage, NAVER, KAIST, NYU, etc.)

Caveats

  • The README is sparse on dataset construction details; the paper is the real source of truth
  • No code visible in the repo itself—this appears to be a documentation and results hub, not an implementation

Verdict

Worth bookmarking if you work on Korean NLP or need to evaluate multilingual models fairly on Korean. Skip it if you’re looking for novel architectures or training code—this is infrastructure, not invention.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.