lucidrains/magvit2-pytorch
PyTorch implementation of MagViT2, a state-of-the-art video tokenizer for visual generation models.

Velocity · 7d
+0.7
★ / day
Trend
→steady
star history
This repository provides a PyTorch implementation of the MagViT2 tokenizer from the paper ‘Language Model Beats Diffusion - Tokenizer is Key to Visual Generation’. The tokenizer converts video frames into discrete tokens using a Lookup Free Quantizer and transformer-based architecture, enabling efficient video understanding and generation. It supports configurable image sizes, codebook sizes, and layer structures for training custom video tokenizers.