A 2019 Chinese NLP competition solution that still pipelines along
A TensorFlow/BERT pipeline for schema-constrained entity and relation extraction, built for a 2019 Baidu competition and left as a reproducible artifact.

What it does
This is a two-stage pipeline for extracting subject-predicate-object triples from Chinese text. First, a multi-label BERT classifier guesses which relations a sentence might contain. Then a sequence-labeling BERT model finds the actual entity spans for those relations. The output is structured triples like (entity1, relation, entity2) that must obey a predefined schema — e.g., only a “图书作品” can be the subject of an “作者” relation.
The interesting bit The pipeline mirrors the architecture used by the winning team in the 2019 competition (89.3% F1), yet the README is admirably blunt about its own limits: training can be parallel, but inference must be strictly sequential. The schema constraint is the real task master — the model isn’t free-associating relations, it’s slot-filling against 50 predefined templates.
Key highlights
- Built on TensorFlow 1.12+ and Chinese BERT-base, with the usual BERT hyperparameters (2e-5 LR, 128 seq length)
- Targets the SKE dataset: 430K triples, 210K sentences, 50 schemas drawn from Baidu Baike and feed text
- Published results top out around 79.7% F1 — solid but not championship-grade
- Includes data prep scripts, evaluation utilities, and a direct link to the winning team’s report for comparison
- README is bilingual and notably honest about data availability (“There is no longer a raw data download”)
Caveats
- Requires TensorFlow 1.x, which is now firmly in maintenance territory
- Raw competition data is no longer officially available; contact the author directly if you want to reproduce
- Inference is explicitly sequential, not end-to-end — you run classification, convert outputs, then run labeling
Verdict Worth studying if you’re building schema-constrained Chinese IE systems or need a concrete BERT pipeline reference from the pre-transformers-as-platforms era. Skip it if you need modern PyTorch, multilingual support, or an actively maintained codebase.