Is magvit2-pytorch open source?

Yes — lucidrains/magvit2-pytorch is open source, released under the MIT license.

What language is magvit2-pytorch written in?

lucidrains/magvit2-pytorch is primarily written in Python.

How popular is magvit2-pytorch?

lucidrains/magvit2-pytorch has 668 stars on GitHub.

Where can I find magvit2-pytorch?

lucidrains/magvit2-pytorch is on GitHub at https://github.com/lucidrains/magvit2-pytorch.

← all repositories

lucidrains/magvit2-pytorch

The video tokenizer behind the 'language model beats diffusion' paper

Compresses video into discrete tokens so language models can generate frames instead of diffusion.

★668 stars Python Image · Video · Audio ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is a PyTorch implementation of MagViT2, the video tokenizer from the paper Language Model Beats Diffusion — Tokenizer is Key to Visual Generation. It encodes video (or images) into a compact grid of discrete tokens and decodes them back, effectively giving transformers a vocabulary for visual generation. The repo includes a full training harness with adversarial losses, multi-scale discriminators, and EMA tracking.

The interesting bit

The README notes that Tencent built its Open-MAGVIT2 release on top of this code, which is a useful external validation that the architecture actually converges. The project also treats image pretraining as a first-class citizen, letting you warm up the tokenizer on stills before tackling video.

Key highlights

Implements space-time compression with residual blocks, linear attention, and configurable 3D convolutions.
Bundles a VideoTokenizerTrainer that handles discriminator management, gradient accumulation, and EMA averaging.
Supports mixed image-and-video pretraining; the README cites prior work showing image pretraining helps video synthesis.
Multi-scale temporal discriminators and adaptive RMSNorm are included for conditioning.
Weights & Biases integration is built in for experiment tracking.

Caveats

The todo list still has open items, including axial rotary embeddings for spatial attention and helper utilities for multi-resolution temporal discriminators.
The Lookup Free Quantizer (LFQ) core lives in a separate repository, so this package relies on an external dependency for the actual quantization step.
It is unclear from the README whether the unchecked roadmap items are nice-to-haves or blockers for full reproduction.

Verdict

Worth a look if you are building or reproducing autoregressive video-generation pipelines and need a tokenizer with a training loop already wired up. Skip it if you want a turnkey, fully validated end-to-end model without assembly.

Frequently asked

What is lucidrains/magvit2-pytorch?: Compresses video into discrete tokens so language models can generate frames instead of diffusion.
Is magvit2-pytorch open source?: Yes — lucidrains/magvit2-pytorch is open source, released under the MIT license.
What language is magvit2-pytorch written in?: lucidrains/magvit2-pytorch is primarily written in Python.
How popular is magvit2-pytorch?: lucidrains/magvit2-pytorch has 668 stars on GitHub.
Where can I find magvit2-pytorch?: lucidrains/magvit2-pytorch is on GitHub at https://github.com/lucidrains/magvit2-pytorch.