← all repositories
hemingkx/CLUENER2020

Chinese NER: when BERT meets CRF and nobody gets hurt

A straightforward PyTorch baseline for CLUENER2020 that stacks BiLSTM, BERT, and RoBERTa with optional CRF layers to see what actually moves the needle on fine-grained Chinese entity recognition.

516 stars Python ML FrameworksLanguage Models
CLUENER2020
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This repo implements four baseline architectures for the CLUENER2020 Chinese NER task: vanilla BiLSTM-CRF, BERT with softmax, BERT-CRF, and BERT-BiLSTM-CRF. Swap in RoBERTa-wwm-ext-large for BERT and you get the RoBERTa variants. It is essentially a clean, runnable reference implementation with a results table.

The interesting bit

The README is admirably honest about its own limitations. The author notes the dataset has quality issues, admits to using the validation set as a test set because the real test set is locked behind a limited-submission leaderboard, and even flags that you must manually move train.log before re-running or it gets overwritten. This is baseline code that knows it is baseline code.

Key highlights

  • Four model variants with clear F1 score breakdowns across 10 entity types (address, book, company, game, government, movie, name, organization, position, scene)
  • RoBERTa-wwm-ext-large + BiLSTM + CRF edges out pure RoBERTa-CRF overall (79.64 vs 79.34 F1), though the gap is narrow and category-dependent
  • Requires manual BERT/RoBERTa model download and TensorFlow-to-PyTorch conversion, with a Baidu Netdisk link provided for the impatient
  • Built on transformers==2.2.2 and PyTorch 1.5.1 — versions that feel increasingly archaeological

Caveats

  • transformers==2.2.2 is pinned; upgrading likely breaks things
  • The “test set” is really the validation set, so numbers are not directly comparable to official leaderboard submissions
  • Data quality issues in CLUENER2020 are acknowledged but not mitigated in code

Verdict

Grab this if you need a working Chinese NER starter in PyTorch and want to see how much CRF and BiLSTM stacking actually help on top of a large pretrained model. Skip it if you need production-ready code, modern dependency versions, or rigorous evaluation against the true test set.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.