Java's answer to "just give me Chinese NLP that works"
A modular, Maven-friendly toolkit that ships perception-based segmentation, NER, pinyin, and BM25 without dragging in Python's ecosystem.

What it does
Mynlp is a Java-native Chinese NLP toolkit built for production use. It covers the standard bases—word segmentation, part-of-speech tagging, named entity recognition, pinyin conversion, traditional/simplified Chinese conversion, and BM25 scoring—packaged as discrete Maven modules so you pull only what you need.
The interesting bit
The resource-splitting is unusually sane. Core dictionaries and models (some 60MB+) live in separate artifacts, not bundled into the main JAR. You can opt for the “lazy” mynlp-all convenience package or cherry-pick resources à la carte—useful if you’re counting megabytes or avoiding unused model bloat in containers.
Key highlights
- Perceptron-based segmentation and tagging (not purely dictionary-driven)
- fastText and StarSpace integration for word/label representations
- Custom dictionary support with correction capabilities
- New word discovery and person-name recognition as built-in modules
- Acknowledged lineage from HanLP and ansj_seg—borrows proven algorithms rather than reinventing them quietly
Caveats
- Documentation and community presence (QQ group, Chinese-language docs) assume Chinese fluency; English support appears minimal
- 690 stars suggests modest adoption outside its target ecosystem; battle-testing at scale is unclear from the README alone
Verdict
Worth a look if you’re running JVM-based services and need Chinese text processing without bridging to Python. Skip it if your pipeline is already invested in HanLP’s newer iterations or if you need extensive multilingual support.