kakaobrain/rq-vae-transformer
Transformer-based autoregressive image generation model using Residual Quantization with a two-stage RQ-VAE and RQ-Transformer framework for high-resolution synthesis.

This repository provides the official implementation of a two-stage framework for high-resolution image generation. The first stage uses RQ-VAE to quantize and represent an image as a stack of discrete codes, while the second stage employs an autoregressive RQ-Transformer to generate these codes conditionally. The model supports both class-conditional and text-conditional image synthesis, and includes pretrained checkpoints for reproduction of the published results.