Open-source Sora clone trains on Huawei chips, not NVIDIA
A Beijing lab's text-to-video model now runs entirely on Ascend NPUs, with sparse attention tricks that cut compute without cutting quality.

What it does Open-Sora Plan is a text-to-video diffusion model built to replicate OpenAI’s Sora. It generates clips up to 121 frames at 576×1024 resolution, with image-to-video and transition-generation modes. The project is led by a PKU lab with contributors from Huawei and Peng Cheng Laboratory.
The interesting bit Version 1.5.0 is trained and inferred entirely on Huawei Ascend 910 NPUs using the MindSpeed framework — no NVIDIA required. The team also developed a sparse attention architecture called SUV that claims 35% speedup over dense attention with near-equivalent quality, plus a custom WFVAE with 8×8×8 compression that beats Wan2.1’s VAE on PSNR.
Key highlights
- 8B-parameter model trained on 40 million video samples, benchmarked against HunyuanVideo
- Sparse 3D attention (SUV) replaces dense 2+1D architectures from earlier versions
- Custom causal video VAE handles arbitrary resolutions with 8×8×8 downsampling
- GPU support listed as “coming soon”; current weights only run on Ascend + MindSpeed-MM
- Published technical reports for v1.3, v1.5, and a separate Helios model for minute-scale generation
Caveats
- Some v1.2.0 weights trained on Panda70M lack final fine-tuning and may watermark outputs
- Inference frame counts must follow 4n+1 arithmetic (93, 77, 61…), a quirk of the stride-32 training
- GPU code branch is not yet available; most developers will need NPU access to run the latest model
Verdict Worth watching if you’re building video generation infrastructure outside the CUDA ecosystem, or researching sparse attention trade-offs. Skip for now if you need GPU-ready weights today — the “coming soon” has been pending since June 2025.