← all repositories
SeanLee97/xmnlp

A Swiss-Army knife for Chinese text that fits in one import

xmnlp bundles a dozen Chinese NLP tasks—segmentation, NER, sentiment, pinyin, even radicals—behind a single pip install, with ONNX models you download separately.

1.3k stars Python Language Models
xmnlp
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

xmnlp is an all-in-one Chinese NLP toolkit. It handles word segmentation, part-of-speech tagging, named-entity recognition, sentiment analysis, text correction, keyword/keyphrase extraction, pinyin conversion, and even Chinese character radical lookup. Most heavy lifting runs through RoBERTa + CRF models exported to ONNX, with faster rule-based fallbacks (reverse maximum matching) when you don’t need neural precision.

The interesting bit

The “speed vs. accuracy” dial is explicit: every major task exposes both fast_* and deep_* variants, so you can trade neural nuance for throughput without swapping libraries. The radical lookup and pinyin features are just HashMap and Trie lookups—simple, but oddly hard to find bundled with modern transformer-based tools.

Key highlights

  • Segmentation, POS tagging, and NER via RoBERTa + CRF finetuning, with custom dictionary support (jieba-compatible format)
  • Sentiment analysis and spell-checking (detector + corrector) included
  • Keyword/keyphrase extraction via Textrank
  • Sentence embeddings and similarity calculation
  • ONNX Runtime inference; supports Python 3.6–3.8 on Linux, Windows, macOS
  • Models downloaded separately via Feishu or Baidu Netdisk—version-locked to the package

Caveats

  • Deep model interfaces are Simplified-Chinese only; no Traditional Chinese support
  • Model weights are hosted on Chinese cloud services (Feishu/Baidu), not HuggingFace or GitHub releases
  • Python 3.6–3.8 support suggests the project may not be actively tracking newer releases

Verdict

Good fit if you need one library to cover the full Chinese NLP pipeline without orchestrating multiple dependencies. Skip it if you require Traditional Chinese, want models pip-installable from PyPI, or need the bleeding-edge accuracy of dedicated single-task libraries.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.