AIDC-AI/Awesome-Unified-Multimodal-Models
A curated collection of papers, datasets, and benchmarks on unified multimodal AI models combining vision and language.

Velocity · 7d
+3.2
★ / day
Trend
→steady
star history
An awesome list tracking advances in unified multimodal models that handle both image and text inputs and outputs. It categorizes diffusion-based, autoregressive (MLLM), and hybrid architectures, with benchmarks and datasets for evaluating multimodal comprehension and generation. Designed to help researchers and practitioners explore, compare, and build state-of-the-art unified multimodal systems.