google-research/magvit
A masked generative video transformer that generates videos using a tokenizer and transformer architecture in JAX.

Velocity · 7d
+0.8
★ / day
Trend
→steady
star history
MAGVIT is a masked generative video transformer that generates videos by tokenizing video frames and using transformer-based masked modeling. It achieves state-of-the-art results across video generation and prediction benchmarks including UCF-101, Kinetics-600, and BAIR robot pushing. The official JAX implementation provides training and inference capabilities for the model.