FoundationVision/VAR
Visual Autoregressive (VAR) model generates images using GPT-style next-scale prediction, achieving state-of-the-art results and demonstrating scaling laws in visual generation.

VAR reimagines image generation as a next-scale prediction task rather than next-token prediction, enabling GPT-style transformers to compete with and surpass diffusion models. The official implementation provides training and inference code for autoregressive image generation across multiple model sizes. It won NeurIPS 2024 Best Paper Award for demonstrating scaling laws in visual generation comparable to language model scaling.