A neural NER toolkit that predates the transformer era
NeuroNER wraps TensorFlow 1.x and GloVe embeddings into a command-line tool for named-entity recognition, complete with pretrained models and BRAT integration.

What it does NeuroNER trains and runs named-entity recognition using a character-level LSTM, token embeddings, and optionally a CRF layer. It accepts CoNLL-2003 or BRAT format datasets, handles train/valid/test/deploy splits, and can operate in three modes: train from scratch, fine-tune a pretrained model, or predict on unlabeled text.
The interesting bit
The project treats reproducibility and sharing as first-class concerns. It ships with a prepare_pretrained_model.py script that strips dataset-specific token mappings for privacy, and it pins architecture hyperparameters so loaded models don’t silently mismatch. That care for provenance was less common in 2017 than it is now.
Key highlights
- Pretrained models available for CoNLL-2003, i2b2 2014 de-identification, and MIMIC clinical text
- BRAT format support lets you annotate or review predictions in a web UI
- TensorBoard logging built in; training F1 on CoNLL-2003 reported around 0.90
- Command-line interface with
--fetch_dataand--fetch_trained_modelhelpers - Published at EMNLP 2017 with a medical de-identification variant in JAMIA
Caveats
- Requires TensorFlow 1.0+ and Python 3; the README explicitly warns it does not work with Python 2.x
- Word embeddings must be downloaded separately (GloVe 100d is the documented default)
- Several architecture parameters must be manually synced to the pretrained model’s original values; mismatches will break loading
Verdict Worth a look if you need a stable, well-documented baseline for NER on small datasets or clinical text, or if you’re reproducing 2016–2017 neural NER papers. Skip it if you want modern transformer-based models or out-of-the-box multilingual support.