← all repositories
crownpku/Rasa_NLU_Chi

Rasa NLU, but it finally understands your aunt's medical questions

A 2017 fork of Rasa NLU that wires Chinese tokenization (Jieba) and MITIE word features into the standard intent-and-entity pipeline.

1.5k stars Python Chat AssistantsLanguage Models
Rasa_NLU_Chi
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

This is a fork of the pre-1.0 Rasa NLU library with Chinese language support bolted on. It takes Chinese text, tokenizes it with Jieba, extracts named entities and intent using MITIE (and optionally scikit-learn for classification), then returns structured JSON you can act on. The README shows a medical query — “我发烧了该吃什么药?” — returning “disease: 发烧” and intent “medical”.

The interesting bit

The heavy lifting isn’t the fork itself; it’s the pre-trained word feature extractor built from Chinese Wikipedia and Baidu Baike. Training that from scratch takes 2–3 days on MITIE’s wordrep tool, so the project offers a downloadable model. That corpus choice — general encyclopedia text — is a pragmatic bet that works for broad domains but may drift on narrow ones.

Key highlights

  • Two pipeline presets: MITIE+Jieba, or MITIE+Jieba+sklearn (the latter recommended)
  • Supports custom Jieba user dictionaries for domain-specific tokenization
  • Pre-trained Chinese word features available via the author’s blog
  • Standard Rasa NLU server interface: train, serve, curl against localhost:5000/parse
  • 1,532 stars suggests it filled a real gap in 2017-era Chinese chatbot tooling

Caveats

  • README explicitly points to the official Rasa NLU docs for “newest instructions,” implying this fork may lag behind upstream
  • No version compatibility notes; Rasa has since rearchitected entirely (Rasa Open Source 2.x/3.x)
  • Confidence scores in the example are modest (0.54 top intent), with four other intents getting non-trivial probability mass

Verdict

Worth studying if you’re maintaining a legacy Chinese bot built on 2017-era Rasa, or if you need a fully self-hosted NLU without cloud APIs. Skip it if you’re starting fresh — modern Rasa, spaCy, or dedicated Chinese NLP frameworks (HanLP, LTP) have superseded this approach.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.