microsoft/NUWA
A collection of visual synthesis models for generating and manipulating images and videos using transformer and diffusion architectures.
★2.8k stars Image · Video · Audio

Velocity · 7d
+1.7
★ / day
Trend
→steady
star history
Microsoft Research’s NUWA is a suite of multimodal pretrained models for visual synthesis tasks including image generation, video generation, image inpainting, and 3D photography. The repository implements transformer-based and diffusion-based approaches (NUWA-Infinity, NUWA-LIP, NUWA-XL) for generating high-resolution images and long-duration videos from text guidance and other modalities.