← all repositories

thu-ml/unidiffuser

A unified diffusion framework that performs image generation, text generation, text-to-image, and image-to-text synthesis in a single transformer model.

unidiffuser
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

This repository implements a multi-modal diffusion model that unifies marginal, conditional, and joint distributions for image-text data. The approach perturbs data across all modalities simultaneously and uses a transformer backbone to predict noise for each modality with individual timesteps. The model is trained on large-scale paired image-text data and can perform diverse generation tasks by setting appropriate timesteps without architectural changes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.