← all repositories

baofff/U-ViT

A PyTorch implementation of U-ViT, a Vision Transformer backbone for diffusion models used in image generation tasks.

1.1k stars Jupyter Notebook Image · Video · Audio
U-ViT
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

The repository provides an official implementation of a ViT-based architecture that replaces traditional CNN-based U-Nets in diffusion models. It treats all inputs including time, condition, and noisy image patches as tokens and uses long skip connections between shallow and deep layers. The model is evaluated on unconditional and class-conditional image generation as well as text-to-image generation tasks, achieving FID scores of 2.29 on ImageNet 256x256 and 5.48 on MS-COCO for text-to-image generation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.