← all repositories
datanada/Awesome-Korean-NLP

A field guide to not getting lost in Korean NLP

A curated list of tools, datasets, and papers for processing Korean text, because agglutinative morphology doesn't solve itself.

Awesome-Korean-NLP
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This is a curated awesome-list that catalogs resources for Korean-language NLP: morphological analyzers, datasets like Sejong and NamuWiki dumps, papers, lectures, and community links. It covers both Korean-specific tools (Hannanum, Kkma, Komoran, Mecab-ko) and language-agnostic packages with Korean bindings (KoNLPy, FastText, gensim).

The interesting bit

The list explicitly splits between “NLP of Korean text” and “NLP information written in Korean” — a useful distinction if you’re hunting for tools versus hunting for tutorials you can actually read. The maintainer also keeps a live collabedit link for casual contributions, which feels charmingly retro.

Key highlights

  • Morpheme analyzers: 12+ options including Java stalwarts (Hannanum, Kkma), C++ workhorses (Mecab-ko), and newer entrants (Rouzeta, seunjeon)
  • Datasets: Government corpora (Sejong, KAIST), web dumps (Wikipedia, NamuWiki), and sentiment-labeled data (Naver movie corpus)
  • Bindings matter: KoNLPy wraps multiple Java analyzers for Python; kroman ports Hangul romanization across five languages
  • Community links: Korean-language NLP conferences since 1989, plus active Facebook groups (Tensorflow KR, AI Korea)
  • Odd gems: A crowdsourced Korean profanity dictionary and a TextRank summarizer demo running on Heroku

Caveats

  • Several paper links are dead (marked with strikethrough), and the English papers section is empty
  • Some tool links point to Korean-only pages or SourceForge projects that may be unmaintained
  • The “collabedit” contribution method suggests the list may not see frequent structured updates

Verdict

Worth bookmarking if you’re doing Korean NLP and tired of re-discovering that Mecab-ko exists. Skip it if you need actively maintained, benchmarked comparisons — this is a directory, not a review site.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.