Wan-Video/Wan2.1

Open video generation that fits on a single GPU

Alibaba's Wan2.1 ships 14B and 1.3B parameter models for text-to-video, image-to-video, and editing—claiming SOTA results without the SOTA hardware barrier.

★16.2k stars Python Image · Video · Audio

View on GitHub ↗ Homepage ↗

Velocity · 7d

+35

★ / day

Trend

→steady

star history

What it does Wan2.1 is a suite of open video foundation models from Alibaba’s Wan team. It generates video from text prompts, still images, first-and-last frames, or editing instructions, and also handles text-to-image and video-to-audio. The repo provides inference code, model weights, and integrations for ComfyUI, Diffusers, and Gradio.

The interesting bit The 1.3B text-to-video model runs in ~8.2 GB VRAM—roughly a single RTX 4090—and generates 5 seconds of 480p video in about 4 minutes without quantization tricks. The project also claims to be the first video model that generates readable Chinese and English text inside the video itself, which is rarer than it sounds.

Key highlights

14B and 1.3B parameter variants for T2V, I2V, first-last-frame-to-video, and VACE (all-in-one editing)
Wan-VAE encodes/decodes 1080p video of arbitrary length while preserving temporal information
ComfyUI and Diffusers integrations shipped; Gradio demos included
Active ecosystem: community projects include motion control (Wan-Move), virtual try-on (MagicTryOn), autonomous driving world models (DriVerse), and acceleration frameworks (TeaCache claims ~2x speedup)
Weights hosted on both Hugging Face and ModelScope

Caveats

The 1.3B model at 720p is described as “less stable” than at 480p due to limited training at that resolution
First-last-frame-to-video is trained primarily on Chinese text-video pairs, so Chinese prompts are recommended for better results
Several Diffusers + multi-GPU inference items remain unchecked on the todo list

Verdict Worth a look if you want open video generation with a consumer GPU option and a growing tooling ecosystem. Skip if you need guaranteed production reliability or mature multi-GPU Diffusers support today.