apple/ml-mdm
A Python framework for training high-resolution text-to-image diffusion models up to 1024x1024 pixels using a hierarchical Matryoshka approach.

This repository implements Matryoshka Diffusion Models, a technique for efficiently training diffusion models at high resolutions using progressive scaling. It provides an end-to-end pipeline for text-conditioned image synthesis, supporting resolutions from 64 to 1024 pixels with a single unified model. The framework includes pretrained checkpoints, training tutorials on the CC12M dataset, and leverages PyTorch for implementation.