← all repositories
Wan-Video/Wan2.2

Alibaba's open video model grows up, adds experts

Wan2.2 brings Mixture-of-Experts to video diffusion, plus a 5B model that runs 720P@24fps on a consumer GPU.

Wan2.2
Velocity · 7d
+51
★ / day
Trend
steady
star history

What it does Wan2.2 is a family of open-source video generation models from Alibaba’s Wan team. It handles text-to-video, image-to-video, speech-to-video, and character animation. The flagship 14B models use a Mixture-of-Experts architecture; a smaller 5B model targets 720P at 24fps on hardware like an RTX 4090.

The interesting bit The MoE design splits the denoising process across timesteps into specialized “expert” sub-models. The README claims this expands capacity without increasing compute cost—a trick more common in LLMs than diffusion. The 5B TI2V model also packs a VAE with 16×16×4 compression, which is how it squeezes high-res generation into consumer VRAM.

Key highlights

  • Five model variants: T2V-A14B, I2V-A14B, TI2V-5B, S2V-14B (speech-driven), and Animate-14B (character animation)
  • 14B models need ~80GB VRAM for single-GPU inference; the 5B model is the consumer-friendly option
  • Integrations already shipped for ComfyUI, Diffusers, and community tools like LightX2V and FastVideo
  • Training data grew +65.6% images and +83.2% videos versus Wan2.1
  • Apache 2.0 weights hosted on both Hugging Face and ModelScope

Caveats

  • The “TOP performance among all open-sourced and closed-sourced models” claim is stated but not substantiated with benchmarks in the README
  • 80GB VRAM for the 14B models puts serious generation out of reach for most individual developers
  • The speech-to-video path requires an additional CosyVoice dependency

Verdict Worth a look if you’re building video generation pipelines and need open weights with broad modality coverage. Skip if you’re hoping to train from scratch or run top-tier models on modest hardware—the 5B model is usable, but the 14B variants are firmly workstation-or-cloud territory.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.