The 2014 paper that made CNNs work for text
Reference implementation of Yoon Kim's EMNLP paper applying convolutional networks to sentence classification, warts and all.

What it does
This is the original code for Kim’s 2014 EMNLP paper, which showed that 1-D convolutions over word embeddings could classify sentences surprisingly well. It runs three model variants—random embeddings, static word2vec, and fine-tuned word2vec—on the Pang & Lee movie review dataset. The README is refreshingly honest about what was and wasn’t tested.
The interesting bit
The author openly admits the paper’s limitations: no GPU access during original experiments, missing ablation studies, and “premature” conclusions about regularization. That’s unusual candor in ML research code. The repo also serves as a historical marker—Theano 0.7 and Python 2.7 place it firmly in deep learning’s Jurassic period.
Key highlights
- Reproduces CNN-rand, CNN-static, and CNN-nonstatic from the paper
- Expect >81% CV accuracy with CNN-nonstatic (different folds than paper, though)
- GPU gives 10–20× speedup over CPU (219s → 16s per epoch)
- Links to TensorFlow and Torch reimplementations by others
- Author cites follow-up work (Ye Zhang 2015) that properly explored hyperparameters
Caveats
- Requires Python 2.7 and Theano 0.7—both long dead
- Fold assignments differ from the paper; direct comparison is slightly off
- No modern framework support; this is strictly for historical reference
Verdict
Worth a look if you’re tracing the evolution of NLP architectures or teaching the history of the field. Skip it if you need working, modern code—use one of the linked reimplementations instead.