BERT for NER: a clean rewrite of the 2018 classic
A tidier reference implementation for fine-tuning BERT on CoNLL-2003 named entity recognition, with pluggable CRF or softmax heads.

What it does
This repo fine-tunes Google’s original BERT on the CoNLL-2003 NER task, tagging tokens as person, organization, location, or miscellaneous entities. It is essentially a cleaned-up rewrite of an earlier version: the author removed hard-coded paths, added annotations, and packaged everything into a single shell script that trains, evaluates, and predicts in one shot.
The interesting bit
The author explicitly flags that the code is designed for quick experimentation with the output layer. You can swap between a CRF layer and a plain softmax layer with a single flag (--crf=True/False), which makes it useful as a teaching scaffold or a baseline rather than a production system.
Key highlights
- Single-file entry point (
BERT_NER.py) with a bash runner (run_ner.sh) - Supports both CRF and softmax decoding heads out of the box
- Includes the standard CoNLL-2003 evaluation script (
conlleval.pl) for comparable F1 scores - Achieves ~89.7 F1 on the test set with default settings; author notes this trails the paper’s 92.4 by roughly 2.7 points and suggests “tricks” are needed to close the gap
- Author now points to a successor project, NLPGNN, for better performance
Caveats
- Requires manual download of the original BERT checkpoint and the CoNLL-2003 data; nothing is fetched automatically
- Built for the 2018 TensorFlow 1.x BERT implementation, so expect legacy dependency friction
- The 2.7-point gap to published results is left as an exercise; no ablation or guidance is provided beyond “maybe some tricks”
Verdict
Worth a look if you need a minimal, readable BERT-NER baseline to hack on, especially for comparing CRF vs. softmax behavior. Skip it if you want a batteries-included, state-of-the-art NER package—modern libraries like Hugging Face transformers or the author’s own NLPGNN have superseded this.