← all repositories
timbmg/Sentence-VAE

Teaching neural nets to write like Wall Street Journal circa 1995

A clean PyTorch redo of the 2015 paper that first squeezed sentences through a continuous latent space.

591 stars Python Language ModelsML Frameworks
Sentence-VAE
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does This repo re-implements Bowman et al.’s Sentence-VAE, which trains an autoencoder to compress English sentences into a smooth Gaussian latent space and decode them back. You can sample new sentences, or interpolate between two sentence embeddings and watch the grammar morph gradually. It runs on the Penn Tree Bank dataset and tracks ELBO, NLL, and KL divergence through training.

The interesting bit The “n n n n n” in the samples isn’t a bug—it’s the model’s honest admission that it hasn’t learned those words yet. The interpolation results are more telling: the latent space actually preserves some grammatical structure, morphing from “the company said…” to “they were n’t paid” through plausible (if stilted) intermediate steps.

Key highlights

  • Clean PyTorch re-implementation with RNN and GRU support; LSTM notably absent
  • KL annealing (logistic or linear) to prevent the latent space from collapsing early in training
  • Word dropout and embedding dropout on the decoder input for regularization
  • TensorBoard logging and checkpointing built in
  • Includes dowloaddata.sh script (sic—typo preserved from upstream) to fetch PTB data

Caveats

  • Training stopped after just 4 epochs; reported ELBO was only properly optimized for ~1 epoch
  • Samples are heavily degraded with <unk> tokens (n in the output), suggesting the vocabulary cutoff or training duration is stingy
  • No LSTM support despite the original paper using it; this may limit reproducibility

Verdict Useful if you need a minimal, hackable VAE-for-text baseline in modern PyTorch. Skip it if you want production-quality generation or faithful reproduction of the 2015 results.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.