← all repositories
richliao/textClassifier

Three ways to classify text, circa 2016 Keras

A reference implementation of attention-based document classification, back when hierarchical attention was still novel.

1.1k stars Python Language ModelsML Frameworks
textClassifier
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This repo contains three standalone Keras scripts that classify text using different neural architectures from mid-2010s NLP research: a hierarchical attention network (word-level and sentence-level attention), a CNN for sentence classification, and a bidirectional LSTM with single-level attention. All three target the same IMDB sentiment dataset.

The interesting bit

The hierarchical attention implementation is the draw — it mirrors the NAACL 2016 paper that showed attention weights could surface which words and sentences mattered for a document’s label. The author even added a forward-pass hook to extract those weights, though they candidly note the results “are not very promising.”

Key highlights

  • Three reference architectures in one repo: HAN, CNN (Yoo Kim), and attentional BiLSTM
  • Self-contained scripts with blog posts walking through each implementation
  • Includes attention-weight extraction for interpretability experiments
  • Updated fork fixes compatibility issues for Python 2.7 and Keras 2.0.8

Caveats

  • Stuck on Python 2.7 and Keras 2.0.8; you’ll need to downgrade or port for modern use
  • Setup is manual: download Kaggle IMDB data, fetch GloVe vectors, install NLTK punkt
  • The attention-weight results are explicitly described as underwhelming by the author

Verdict

Worth a look if you’re studying how attention mechanisms were first implemented in Keras, or need a teaching example of hierarchical attention. Skip it if you want production-ready text classification — transformers and modern libraries have made this largely a historical reference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.