← all repositories

Gen-Verse/MMaDA

Multimodal diffusion language model that unifies textual reasoning, visual understanding, and text-to-image generation in a single 8B parameter architecture.

MMaDA
Velocity · 7d
+4.3
★ / day
Trend
steady
star history

MMaDA is a family of multimodal diffusion foundation models that replaces autoregressive generation with diffusion-based token prediction across modalities. It uses a unified diffusion architecture with modality-agnostic design to handle text, images, and their combinations. The model incorporates mixed chain-of-thought reasoning and unified reinforcement learning training for improved reasoning capabilities across textual and visual tasks.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.