← all repositories
smoothnlp/SmoothNLP

A Chinese NLP toolkit that phones home for answers

SmoothNLP wraps Java-backed NLP pipelines in a Python API, with a freemium cloud service doing the heavy lifting.

623 stars Java Language Models
SmoothNLP
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does SmoothNLP provides standard Chinese NLP pipelines—tokenization, POS tagging, NER (including financial entities), dependency parsing, and sentence splitting—through a Python interface. The core algorithms are implemented in Java (found in smoothnlp_maven), but typical usage routes everything through cloud microservices with a 5 QPS rate limit for free users. It also includes knowledge graph extraction with typed edges and nodes, plus some unsupervised features like new-word discovery.

The interesting bit The “explainable inference” branding is mostly aspirational—the README doesn’t actually explain how the models work. What is unusual is the frank hybrid model: open-source glue code, Java backends you can theoretically compile yourself, and a commercial cloud layer with a Pro tier for enterprise. The knowledge graph module at least ships with a built-in visualization and concrete node/edge taxonomies.

Key highlights

  • Tokenization, POS tagging, dependency parsing, NER, and financial entity recognition via smoothnlp.segment, postag, ner, company_recognize, etc.
  • Knowledge graph extraction with 4 edge types (event trigger, state/attribute/numeric description) and 8 node categories including products, companies, and people
  • Unsupervised new-word mining algorithm with a linked Zhihu explanation
  • Multi-threading support configurable via config.setNumThreads()
  • 200-character string limit on basic pipeline calls; longer text requires manual sentence splitting first

Caveats

  • Most “advanced” features (event clustering, supervised event classification) are commercial-only; the README literally says “contact business@smoothnlp.com
  • The free cloud tier is capped at 5 QPS, and the project appears lightly maintained (version 0.4, 623 stars)
  • Matplotlib Chinese font issues are acknowledged with a manual SimHei workaround

Verdict Worth a look if you need quick Chinese NLP prototypes and don’t mind API latency or rate limits. Skip it if you need fully offline, auditable models or serious throughput—the Java core exists but the project clearly steers you toward their paid cloud.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.