A Korean tokenizer that outruns its rivals and fixes your typos
Kiwi is a fast, open-source Korean morphological analyzer with built-in typo correction and bindings for nearly every language you might actually use.

What it does
Kiwi segments Korean text into morphemes—nouns, verbs, particles, endings, and the rest—using the Sejong tag set. It claims ~87% accuracy on web text and ~94% on written text, and since version 0.13.0 it can auto-correct simple typos during analysis. The core is C++, but the project has accumulated wrappers for Python, Java, C#, Go, R, Rust, Flutter, WebAssembly, and even an Android AAR.
The interesting bit
The project ships its own lightweight language model for disambiguation, which is unusual for a “fast” tokenizer. The README shows benchmark charts suggesting it keeps pace with or outruns competitors while still resolving ambiguous splits. Multithreading is built into the library itself, not bolted on by wrappers.
Key highlights
- Core library in C++17 with prebuilt binaries for Windows, Linux, macOS, Android, plus ARM64 and PPC64LE
- Auto typo correction (0.13.0+) with eval data showing recovery on
web_with_typos.txt - Sentence splitting and tokenization benchmarks published, with links to reproduce
- Web demo at kiwi.bab2min.pe.kr for quick testing
- Active CI across x86_64, ARM64, PPC64LE, and WASM
Caveats
- Swift wrapper is “coming soon” as of the README
- Model files live in Git LFS; clone without it and you will have a bad time
- The typo-correction mode loads slower and uses ~2.5× the memory (693 MB vs 278 MB in the sample run)
Verdict
Worth a look if you process Korean text at scale and need speed without sacrificing accuracy. Skip it if you only need English tokenization or if you are allergic to downloading large model files.