Teaching neural nets to write like Wall Street Journal circa 1995
A clean PyTorch redo of the 2015 paper that first squeezed sentences through a continuous latent space.

What it does This repo re-implements Bowman et al.’s Sentence-VAE, which trains an autoencoder to compress English sentences into a smooth Gaussian latent space and decode them back. You can sample new sentences, or interpolate between two sentence embeddings and watch the grammar morph gradually. It runs on the Penn Tree Bank dataset and tracks ELBO, NLL, and KL divergence through training.
The interesting bit The “n n n n n” in the samples isn’t a bug—it’s the model’s honest admission that it hasn’t learned those words yet. The interpolation results are more telling: the latent space actually preserves some grammatical structure, morphing from “the company said…” to “they were n’t paid” through plausible (if stilted) intermediate steps.
Key highlights
- Clean PyTorch re-implementation with RNN and GRU support; LSTM notably absent
- KL annealing (logistic or linear) to prevent the latent space from collapsing early in training
- Word dropout and embedding dropout on the decoder input for regularization
- TensorBoard logging and checkpointing built in
- Includes
dowloaddata.shscript (sic—typo preserved from upstream) to fetch PTB data
Caveats
- Training stopped after just 4 epochs; reported ELBO was only properly optimized for ~1 epoch
- Samples are heavily degraded with
<unk>tokens (nin the output), suggesting the vocabulary cutoff or training duration is stingy - No LSTM support despite the original paper using it; this may limit reproducibility
Verdict Useful if you need a minimal, hackable VAE-for-text baseline in modern PyTorch. Skip it if you want production-quality generation or faithful reproduction of the 2015 results.