Is ckiptagger open source?

Yes — ckiplab/ckiptagger is open source, released under the GPL-3.0 license.

What language is ckiptagger written in?

ckiplab/ckiptagger is primarily written in Python.

How popular is ckiptagger?

ckiplab/ckiptagger has 1.7k stars on GitHub.

Where can I find ckiptagger?

ckiplab/ckiptagger is on GitHub at https://github.com/ckiplab/ckiptagger.

← all repositories

ckiplab/ckiptagger

A neural Chinese NLP pipeline that won't mangle your text

CKIP's tagger segments, tags, and recognizes entities in Chinese while preserving every character you feed it.

★1.7k stars Python ML Frameworks Other AI

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does CkipTagger runs word segmentation, part-of-speech tagging, and named entity recognition on Chinese text. It’s a Python library from Taiwan’s Academia Sinica that you install via pip, download 2GB of model files, and invoke as WS, POS, and NER objects. The pipeline handles Traditional Chinese and claims to support indefinitely long sentences without auto-deleting or changing characters.

The interesting bit The project is explicitly conservative: it promises not to “auto delete/change/add characters,” which is a subtler brag than it sounds. Chinese segmentation tools often normalize or strip punctuation silently; this one treats your input as immutable. It also lets you nudge the segmenter with weighted word lists — a “recommend” dictionary and a stricter “coerce” dictionary — rather than forcing you to accept its neural network’s best guess.

Key highlights

F1 of 97.33% on ASBC 4.0 word segmentation, beating the classic CKIPWS (95.91%) and Jieba-zh_TW (89.80%)
POS accuracy of 94.59% on the same corpus
GPU support via TensorFlow/CUDA, CPU fallback works out of the box
User-defined dictionaries with per-word weights for segmentation hints
Published model architecture: BiLSTM with attention, from an AAAI 2020 paper

Caveats

Requires ~2GB model download from Google Drive or an IIS mirror; no clear versioning of model files
Backend is tf-keras on TensorFlow, so you’re inheriting that dependency stack
GPL-3.0 license, which may complicate commercial use if you distribute derivatives

Verdict Worth a look if you need accurate Traditional Chinese NLP and care about text fidelity — the no-mutation guarantee matters for downstream tasks. Skip if you wanted something lightweight or permissively licensed; this is a research-grade tool with research-grade baggage.

Frequently asked

What is ckiplab/ckiptagger?: CKIP's tagger segments, tags, and recognizes entities in Chinese while preserving every character you feed it.
Is ckiptagger open source?: Yes — ckiplab/ckiptagger is open source, released under the GPL-3.0 license.
What language is ckiptagger written in?: ckiplab/ckiptagger is primarily written in Python.
How popular is ckiptagger?: ckiplab/ckiptagger has 1.7k stars on GitHub.
Where can I find ckiptagger?: ckiplab/ckiptagger is on GitHub at https://github.com/ckiplab/ckiptagger.