TensorFlow NER from 2016: still instructive, mostly historical
A reference implementation of multilayered bidirectional LSTMs for named entity recognition, with a side of embedding archaeology.

What it does
Implements a two-layer bidirectional LSTM in TensorFlow for named entity recognition, trained on CoNLL-2003 (English) and ICON-2013 (Hindi). The pipeline covers word embedding generation, input preparation with POS/chunk/capitalization features, and sequence tagging with softmax output.
The interesting bit
The authors ran a controlled comparison of three embedding methods—Word2Vec, GloVe, and their own “RnnVec”—on identical 100MB corpora at 111 dimensions. GloVe edged ahead on CoNLL test_a; RnnVec lagged significantly, which they report without hedging. The Hindi support via hindi_util.py and the WX converter is a genuine, if narrow, addition.
Key highlights
- Reproduces the architecture from an ICON-16 paper (arXiv:1610.09756)
- Supports three embedding backends: Word2Vec, GloVe, and corpus-trained LSTM embeddings
- Adds 11 hand-engineered features (POS, chunk, capitalization) concatenated to word vectors
- Includes F1/accuracy/recall evaluation and TensorFlow model checkpointing
- Hindi NER via transliteration to Latin script before processing
Caveats
- Built for TensorFlow circa 2016; the
r0.9word2vec tutorial link in the README signals likely API drift - No requirements.txt or setup.py; dependency resolution is manual
- 311-dimensional final results use embeddings trained on “a small 100mb corpus”—the scaling behavior is unclear
Verdict
Worth a look if you’re studying NER evolution or need a baseline LSTM implementation to modernize. Skip it if you want production-ready tooling; transformers and modern frameworks have superseded this stack.