Vchitect/Latte
Latte is a latent diffusion transformer for generating videos from text or image conditions, published in TMLR 2025.

Velocity · 7d
+2.0
★ / day
Trend
→steady
star history
Latte implements a latent diffusion transformer architecture for high-quality video synthesis. The repository provides PyTorch model definitions, pre-trained checkpoints on HuggingFace, and complete training and sampling pipelines. It supports text-to-video and image-to-video generation tasks, serving as the official implementation of the corresponding research paper.