bytetriper/RAE
A two-stage image synthesis pipeline combining Representation Autoencoders (RAE) with Diffusion Transformers (DiT) for high-fidelity image generation.

This repository provides a PyTorch implementation of Representation Autoencoders (RAE), which use frozen pretrained encoders like DINOv2 and SigLIP2 with trained ViT decoders. The RAE latent space is then used to train a Stage 2 diffusion model (DiT) for high-fidelity image synthesis. The codebase supports both GPU and TPU training, includes pretrained weights, and features training/sampling scripts along with evaluation utilities.