← all repositories
NVlabs/NVAE

A VAE that actually scales to 256×256 faces without melting

NVIDIA's NeurIPS 2020 spotlight paper fixes the main reason variational autoencoders fall apart when you make them deep.

NVAE
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

NVAE is a deep hierarchical variational autoencoder that trains likelihood-based generative models on images from MNIST up to 256×256 faces. The repo contains the official PyTorch implementation with training scripts for six datasets and a small zoo of hyperparameter presets.

The interesting bit

VAEs usually collapse when you stack too many layers because posterior inference becomes intractable. NVAE sidesteps this with a deep hierarchy of latent variables and some architectural elbow grease—normalizing flows, residual distributions, and a carefully designed encoder-decoder structure that keeps the variational lower bound tight even at depth.

Key highlights

  • Reproduces Table 1 from the paper: exact training commands for MNIST, CIFAR-10, CelebA 64, ImageNet 32×32, CelebA-HQ 256, and FFHQ 256
  • Multi-node training via mpirun for the larger models (up to 24 V100s)
  • LMDB conversion scripts provided for I/O efficiency on large datasets
  • Smaller model variants trade ~0.01 bpd for fitting on 8 GPUs instead of 24
  • PyTorch 1.6.0, Python 3.7

Caveats

  • Training times are substantial: 21 hours for MNIST (2 GPUs) up to 160 hours for FFHQ 256 (24 GPUs)
  • Hardware requirements are steep; the README assumes V100 clusters and mpirun familiarity
  • No pre-trained checkpoints are mentioned in the README—you’re training from scratch

Verdict

Researchers working on likelihood-based generative modeling or VAE architecture design should grab this. If you need a quick off-the-shelf image generator or lack GPU clusters, look elsewhere.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.