SHI-Labs/Versatile-Diffusion
A unified multimodal diffusion framework handling text-to-image, image-to-text, and variation tasks in a single model.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
Versatile Diffusion implements the first unified multi-flow multimodal diffusion architecture combining VAE, diffuser, and context encoders to handle multiple generation tasks across modalities. It natively supports cross-modal generation and can be extended to semantic-style disentanglement and dual-guided synthesis. The model uses PyTorch and includes a WebUI for convenient inference.