Three ways to classify text, circa 2016 Keras
A reference implementation of attention-based document classification, back when hierarchical attention was still novel.

What it does
This repo contains three standalone Keras scripts that classify text using different neural architectures from mid-2010s NLP research: a hierarchical attention network (word-level and sentence-level attention), a CNN for sentence classification, and a bidirectional LSTM with single-level attention. All three target the same IMDB sentiment dataset.
The interesting bit
The hierarchical attention implementation is the draw — it mirrors the NAACL 2016 paper that showed attention weights could surface which words and sentences mattered for a document’s label. The author even added a forward-pass hook to extract those weights, though they candidly note the results “are not very promising.”
Key highlights
- Three reference architectures in one repo: HAN, CNN (Yoo Kim), and attentional BiLSTM
- Self-contained scripts with blog posts walking through each implementation
- Includes attention-weight extraction for interpretability experiments
- Updated fork fixes compatibility issues for Python 2.7 and Keras 2.0.8
Caveats
- Stuck on Python 2.7 and Keras 2.0.8; you’ll need to downgrade or port for modern use
- Setup is manual: download Kaggle IMDB data, fetch GloVe vectors, install NLTK punkt
- The attention-weight results are explicitly described as underwhelming by the author
Verdict
Worth a look if you’re studying how attention mechanisms were first implemented in Keras, or need a teaching example of hierarchical attention. Skip it if you want production-ready text classification — transformers and modern libraries have made this largely a historical reference.