willisma/SiT
Scalable Interpolant Transformers (SiT) is a generative model family built on Diffusion Transformers for high-resolution image synthesis.

The repository provides an official PyTorch implementation of SiT, exploring interpolant-based approaches that bridge diffusion and flow-based generative models. It includes model definitions, pre-trained weights, and training/sampling code for conditional image generation on ImageNet 256x256. The work builds on DiT architecture while introducing more flexible interpolant frameworks for connecting distributions, achieving state-of-the-art FID scores among diffusion models of equivalent size.