Is CLUENER2020 open source?

Yes — hemingkx/CLUENER2020 is an open-source project tracked on heatdrop.

What language is CLUENER2020 written in?

hemingkx/CLUENER2020 is primarily written in Python.

How popular is CLUENER2020?

hemingkx/CLUENER2020 has 514 stars on GitHub.

Where can I find CLUENER2020?

hemingkx/CLUENER2020 is on GitHub at https://github.com/hemingkx/CLUENER2020.

← all repositories

hemingkx/CLUENER2020

Chinese NER: when BERT meets CRF and nobody gets hurt

A straightforward PyTorch baseline for CLUENER2020 that stacks BiLSTM, BERT, and RoBERTa with optional CRF layers to see what actually moves the needle on fine-grained Chinese entity recognition.

★514 stars Python ML Frameworks Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo implements four baseline architectures for the CLUENER2020 Chinese NER task: vanilla BiLSTM-CRF, BERT with softmax, BERT-CRF, and BERT-BiLSTM-CRF. Swap in RoBERTa-wwm-ext-large for BERT and you get the RoBERTa variants. It is essentially a clean, runnable reference implementation with a results table.

The interesting bit

The README is admirably honest about its own limitations. The author notes the dataset has quality issues, admits to using the validation set as a test set because the real test set is locked behind a limited-submission leaderboard, and even flags that you must manually move train.log before re-running or it gets overwritten. This is baseline code that knows it is baseline code.

Key highlights

Four model variants with clear F1 score breakdowns across 10 entity types (address, book, company, game, government, movie, name, organization, position, scene)
RoBERTa-wwm-ext-large + BiLSTM + CRF edges out pure RoBERTa-CRF overall (79.64 vs 79.34 F1), though the gap is narrow and category-dependent
Requires manual BERT/RoBERTa model download and TensorFlow-to-PyTorch conversion, with a Baidu Netdisk link provided for the impatient
Built on transformers==2.2.2 and PyTorch 1.5.1 — versions that feel increasingly archaeological

Caveats

transformers==2.2.2 is pinned; upgrading likely breaks things
The “test set” is really the validation set, so numbers are not directly comparable to official leaderboard submissions
Data quality issues in CLUENER2020 are acknowledged but not mitigated in code

Verdict

Grab this if you need a working Chinese NER starter in PyTorch and want to see how much CRF and BiLSTM stacking actually help on top of a large pretrained model. Skip it if you need production-ready code, modern dependency versions, or rigorous evaluation against the true test set.

Frequently asked

What is hemingkx/CLUENER2020?: A straightforward PyTorch baseline for CLUENER2020 that stacks BiLSTM, BERT, and RoBERTa with optional CRF layers to see what actually moves the needle on fine-grained Chinese entity recognition.
Is CLUENER2020 open source?: Yes — hemingkx/CLUENER2020 is an open-source project tracked on heatdrop.
What language is CLUENER2020 written in?: hemingkx/CLUENER2020 is primarily written in Python.
How popular is CLUENER2020?: hemingkx/CLUENER2020 has 514 stars on GitHub.
Where can I find CLUENER2020?: hemingkx/CLUENER2020 is on GitHub at https://github.com/hemingkx/CLUENER2020.