← all repositories

wilson1yan/VideoGPT

VideoGPT is a video generation model that uses VQ-VAE for discrete latent representations and a GPT-like transformer architecture for autoregressive generation.

1.1k stars Jupyter Notebook Image · Video · Audio
VideoGPT
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

VideoGPT is a generative model for video that employs VQ-VAE to learn downsampled discrete latent representations of raw video using 3D convolutions and axial self-attention. A GPT-like transformer architecture autoregressively models these discrete latents with spatio-temporal position encodings. The model generates video samples competitive with state-of-the-art GAN models and high-fidelity images from datasets like UCF-101 and TGIF.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.