Yes — bytetriper/RAE is open source, released under the MIT license.

What language is RAE written in?

bytetriper/RAE is primarily written in Python.

bytetriper/RAE has 2k stars on GitHub.

Where can I find RAE?

bytetriper/RAE is on GitHub at https://github.com/bytetriper/RAE.

bytetriper/RAE

Diffusion transformers that treat DINOv2 as an autoencoder

RAE repurposes frozen vision encoders like DINOv2 and SigLIP as the front end of an autoencoder, then trains a diffusion transformer on the resulting latents.

★2k stars Python Image · Video · Audio ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

RAE is a two-stage image generation pipeline. Stage one locks in a pretrained vision encoder—DINOv2, SigLIP2, or MAE—and trains a ViT decoder to reconstruct images from the encoder’s frozen representations. Stage two trains a diffusion transformer, DiTDH, to generate new images directly in that learned latent space.

The interesting bit

Instead of building an autoencoder from scratch, the project treats foundation models as read-only compression engines. The decoder essentially learns to “unzip” rich semantic features back into pixels, and the diffusion model operates on a latent space already shaped by those pretrained representations.

Key highlights

Released pretrained weights for DINOv2-B, SigLIP-B, and MAE-B decoders, plus DiTDH-XL generators at 256×256 and 512×512
Reproduces reported ImageNet FID scores closely: rFID-50k of 0.54 versus 0.57 reported, and gFID-50k of 2.16
Entire pipeline—autoencoder, diffusion model, sampler, and guidance—is driven by a single OmegaConf YAML file
Supports both PyTorch/GPU and TorchXLA/TPU backends
Includes utilities for online evaluation, W&B logging, training resumption, and latent statistic calculation

Caveats

The codebase underwent a major refactor in December 2025 with API changes; the previous release remains on a deprecated branch
Latent normalization statistics rely on a momentum-based update rule, so recalculated stats differ slightly from the released versions depending on batch size and shuffling
Stage 2 training recommends fp32 for stability despite bf16 being supported

Verdict

A solid starting point for researchers exploring latent diffusion with foundation-model representations. Less appealing if you want a compact, single-stage generator that works out of the box.

Frequently asked

What is bytetriper/RAE?: RAE repurposes frozen vision encoders like DINOv2 and SigLIP as the front end of an autoencoder, then trains a diffusion transformer on the resulting latents.
Is RAE open source?: Yes — bytetriper/RAE is open source, released under the MIT license.
What language is RAE written in?: bytetriper/RAE is primarily written in Python.
How popular is RAE?: bytetriper/RAE has 2k stars on GitHub.
Where can I find RAE?: bytetriper/RAE is on GitHub at https://github.com/bytetriper/RAE.