← all repositories
EdGENetworks/attention-networks-for-classification

Attention, but make it bureaucratic: a PyTorch doc classifier

A readable PyTorch re-implementation of the CMU hierarchical attention paper, with honest notes on where it falls short.

607 stars Jupyter Notebook Language ModelsML Frameworks
attention-networks-for-classification
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo implements a neural document classifier that pays attention twice: once to pick important words in each sentence, again to pick important sentences in the document. It’s a PyTorch port of Yang et al.’s NAACL 2016 paper, applied to the IMDB movie review dataset.

The interesting bit

The author is admirably frank. They note they jointly optimize word- and sentence-level attention with one optimizer (the paper used two), admit their padding strategy is inefficient, and report a best accuracy of ~0.35 on a 10-class subset—well below what you’d hope. The struck-through note about PyTorch’s missing mask support, updated when pack_padded_sequence arrived, is a small time capsule of 2017 framework churn.

Key highlights

  • Clean hierarchical attention architecture: word → sentence → document
  • Bidirectional GRU at both levels with attention weights
  • Preprocessed IMDB data provided via Google Drive link
  • Training loss curve included for reproducibility
  • Acknowledges divergence from paper: single optimizer, padded minibatches

Caveats

  • ~35% accuracy on 10-class IMDB is quite poor; the author doesn’t explain why (data mismatch? hyperparameters?)
  • Uses a substitute dataset (84,919 samples) because the original paper’s IMDB split wasn’t available
  • No code structure beyond a single notebook; not packaged for reuse

Verdict

Worth a skim if you’re learning attention mechanisms and want to see the concept in readable PyTorch. Skip it if you need a production document classifier or a faithful reproduction of the paper’s reported results.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.