Is LlamaGen open source?

Yes — FoundationVision/LlamaGen is open source, released under the MIT license.

What language is LlamaGen written in?

FoundationVision/LlamaGen is primarily written in Python.

How popular is LlamaGen?

FoundationVision/LlamaGen has 2k stars on GitHub.

Where can I find LlamaGen?

FoundationVision/LlamaGen is on GitHub at https://github.com/FoundationVision/LlamaGen.

← all repositories

FoundationVision/LlamaGen

Proof that Llama can paint, given three billion parameters

LlamaGen repurposes plain Llama next-token prediction for images, betting that autoregressive models can beat diffusion simply by scaling to billions of parameters.

★2k stars Python Image · Video · Audio Language Models Inference · Serving

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

LlamaGen is a family of image generation models that repurposes standard Llama architecture for visual token prediction. It trains autoregressive transformers on discrete image tokens from custom VQ-VAE tokenizers, releasing pre-trained weights for both class-conditional ImageNet generation and text-to-image synthesis. The repository includes PyTorch training and sampling code, plus integration with the vLLM serving stack.

The interesting bit

The project deliberately strips out visual inductive biases—no diffusion, no U-Net, just next-token prediction scaled to 3B parameters. The authors claim that re-examining tokenizers, scalability, and data quality is enough to make a vanilla LLM architecture competitive at generating images. They also borrow LLM serving infrastructure to speed up inference.

Key highlights

Seven class-conditional checkpoints from 111M to 3.1B parameters, with reported ImageNet FID scores down to 2.18
Two text-conditional models trained on LAION COCO and internal data for 256×256 and 512×512 output
Custom VQ-VAE image tokenizers with 8× and 16× downsampling ratios
vLLM integration for serving, with claimed 300%–400% throughput speedups
Hugging Face demo and MIT-licensed code

Caveats

Text-to-image generation requires additional language-model dependencies documented in a separate readme
The repository offers far more class-conditional checkpoints (seven) than text-conditional ones (two)
The “beats diffusion” claim rests on the authors’ FID tables; direct baseline comparisons are not shown in the README

Verdict

Worth a look if you’re curious whether LLM architectures can transfer cleanly to vision, or if you want pre-trained autoregressive image generators with vLLM serving. Less compelling if you need a wide selection of text-to-image models out of the box.

Frequently asked

What is FoundationVision/LlamaGen?: LlamaGen repurposes plain Llama next-token prediction for images, betting that autoregressive models can beat diffusion simply by scaling to billions of parameters.
Is LlamaGen open source?: Yes — FoundationVision/LlamaGen is open source, released under the MIT license.
What language is LlamaGen written in?: FoundationVision/LlamaGen is primarily written in Python.
How popular is LlamaGen?: FoundationVision/LlamaGen has 2k stars on GitHub.
Where can I find LlamaGen?: FoundationVision/LlamaGen is on GitHub at https://github.com/FoundationVision/LlamaGen.