FoundationVision/Infinity
An autoregressive model that generates high-resolution images by predicting discrete visual tokens bit-by-bit using a transformer architecture.

Infinity is a bitwise autoregressive approach to high-resolution image synthesis that scales autoregressive modeling for visual generation. The model uses a transformer-based architecture to predict visual tokens, treating image generation as a sequence prediction problem similar to language modeling. It includes an image tokenizer for discrete representation and supports text-to-image generation, achieving competitive results compared to diffusion-based approaches.