← all repositories

lucidrains/mmdit

PyTorch implementation of the Multi-Modal Diffusion Transformer (MMDiT) layer from Stable Diffusion 3.

mmdit
Velocity · 7d
+0.7
★ / day
Trend
steady
star history

This repository provides a PyTorch implementation of the MMDiT (Multi-Modal Diffusion Transformer) architecture introduced in the Stable Diffusion 3 paper by Esser et al. It implements the core attention mechanism that allows the model to jointly process text and image tokens during diffusion-based image generation. The implementation includes a single-block version and a generalized version supporting more than two modalities (text, image, audio, video). It also offers an adaptive attention variant using learned gating for dynamic weight selection.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.