← all repositories
monikkinom/ner-lstm

TensorFlow NER from 2016: still instructive, mostly historical

A reference implementation of multilayered bidirectional LSTMs for named entity recognition, with a side of embedding archaeology.

538 stars Python ML FrameworksLanguage Models
ner-lstm
Velocity · 7d
+0.1
★ / day
Trend
steady
star history

What it does

Implements a two-layer bidirectional LSTM in TensorFlow for named entity recognition, trained on CoNLL-2003 (English) and ICON-2013 (Hindi). The pipeline covers word embedding generation, input preparation with POS/chunk/capitalization features, and sequence tagging with softmax output.

The interesting bit

The authors ran a controlled comparison of three embedding methods—Word2Vec, GloVe, and their own “RnnVec”—on identical 100MB corpora at 111 dimensions. GloVe edged ahead on CoNLL test_a; RnnVec lagged significantly, which they report without hedging. The Hindi support via hindi_util.py and the WX converter is a genuine, if narrow, addition.

Key highlights

  • Reproduces the architecture from an ICON-16 paper (arXiv:1610.09756)
  • Supports three embedding backends: Word2Vec, GloVe, and corpus-trained LSTM embeddings
  • Adds 11 hand-engineered features (POS, chunk, capitalization) concatenated to word vectors
  • Includes F1/accuracy/recall evaluation and TensorFlow model checkpointing
  • Hindi NER via transliteration to Latin script before processing

Caveats

  • Built for TensorFlow circa 2016; the r0.9 word2vec tutorial link in the README signals likely API drift
  • No requirements.txt or setup.py; dependency resolution is manual
  • 311-dimensional final results use embeddings trained on “a small 100mb corpus”—the scaling behavior is unclear

Verdict

Worth a look if you’re studying NER evolution or need a baseline LSTM implementation to modernize. Skip it if you want production-ready tooling; transformers and modern frameworks have superseded this stack.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.