hustvl/LightningDiT
LightningDiT is a latent diffusion system that achieves FID=1.35 on ImageNet-256 with 21.8× faster training than standard DiT.

This repository presents VA-VAE and LightningDiT, addressing the optimization dilemma in latent diffusion models where reconstruction and generation objectives conflict. The work introduces a variational approach to balance reconstruction quality with generative capability, and proposes architectural optimizations for faster diffusion transformer training. It achieves competitive FID scores on ImageNet-256×256 image generation with significantly reduced computational requirements.