← all repositories
zjy-ucas/ChineseNER

A 2017-era NER demo that won't run on modern Python

A reference implementation of BiLSTM-CRF for Chinese named entity recognition, frozen in time by TensorFlow 1.2.

1.8k stars Python Language ModelsML Frameworks
ChineseNER
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

This is a straightforward demo of character-level Chinese named entity recognition using a bidirectional LSTM with a CRF output layer. Chinese characters get projected to dense vectors, concatenated with word-boundary features (one-hot vectors), and fed through the BiLSTM-CRF pipeline. The README calls it “simple” and means it.

The interesting bit

The project ships with pre-trained word2vec embeddings on Chinese Wikipedia, which is genuinely useful scaffolding for anyone who wants to see how NER worked before transformers ate the world. The model architecture closely follows a 2016 SIGHAN paper on radical-level features, though this implementation stops at word boundaries instead of radicals.

Key highlights

  • BiLSTM-CRF architecture, the standard neural approach pre-BERT
  • Pre-trained 100-dimensional word vectors from Chinese Wiki corpus (gensim word2vec)
  • One-hot word boundary features as the sole extra input beyond character embeddings
  • Adam optimizer, gradient clipping at 5, 0.5 dropout — sensible 2017 defaults
  • Single-command training and evaluation via main.py
  • Heavy bibliography: eight suggested papers, from Collobert 2011 to the SIGHAN 2016 state-of-the-art

Caveats

  • Requires TensorFlow 1.2.0 and Python 3 with jieba 0.37; good luck installing that stack today
  • README contains multiple typos (“chainese”, “bidirectional” as “bidirectional”, “backword”) suggesting limited maintenance
  • No mention of test scores, dataset details, or reproducibility numbers
  • “Simple demo” is the author’s own description — this is teaching code, not a maintained library

Verdict

Worth a skim if you’re writing a literature review on Chinese NER evolution or need to explain to a junior dev what we did before BERT. Skip it if you need working code for a production pipeline; modern Chinese NER has moved to pretrained transformers and this repo won’t run without archaeology.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.