sihyun-yu/REPA
A method that aligns noisy diffusion transformer states with pretrained visual encoder representations to improve training efficiency and generation quality.

REPA (Representation Alignment for Generation) aligns noisy input states in diffusion models with representations from pretrained visual encoders. The method significantly improves training efficiency, speeding up SiT diffusion transformer training by 17.5x while achieving state-of-the-art image generation quality on ImageNet 256x256 benchmarks. This research targets the core training methodology for generative diffusion models.