← all repositories

showlab/Show-o

A single transformer architecture that unifies multimodal understanding and generation by combining LLMs with diffusion models.

Show-o
Velocity · 7d
+2.9
★ / day
Trend
steady
star history

Show-o is a research repository presenting a unified multimodal model that handles both comprehension and content generation in one transformer. The architecture integrates large language model capabilities with diffusion-based generation, enabling tasks spanning visual understanding (VQA, captioning) and image synthesis. The work represents advances in multimodal AI by eliminating separate encoder-decoder pipelines in favor of a single unified model.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.