PyTorch on Huawei NPUs: the adapter you didn't ask for
torch_npu lets PyTorch scripts run on Ascend AI chips with minimal code changes—if you've already bought into Huawei's stack.

What it does
torch_npu is a backend adapter that bridges PyTorch to Huawei’s Ascend NPUs. Install it alongside Huawei’s CANN toolkit, add import torch_npu (or skip it entirely in 2.5.1+), call .npu() on your tensors, and your matrix multiplications run on Ascend silicon instead of CUDA. The project tracks PyTorch versions from 1.11.0 through 2.9.0 with matching CANN releases.
The interesting bit
The version matrix is the real architecture here. Every PyTorch release maps to a specific CANN release, branch name, and post-release patch level—2.1.0.post17 for CANN 8.3.RC1, 2.1.0.post12 for CANN 8.1.RC1, and so on. It’s less a software project than a dependency coordination exercise across two corporate release trains.
Key highlights
- Supports PyTorch 1.11 through 2.9 across Python 3.7–3.11
- Binary wheels available, but requires pre-installed CANN toolkit
- Source builds via Docker containers with gcc version constraints (10.2 for ARM, 9.3.1 for x86)
.npu()tensor device API mirrors CUDA’s.cuda()pattern- As of 2.5.1, explicit
import torch_npuis no longer required
Caveats
- No performance benchmarks, profiling data, or comparison with CUDA/NVIDIA equivalents in the README
- CANN installation is mandatory and non-trivial; the README links to external Huawei documentation without summarizing steps
- x86 support uses CPU-only PyTorch wheels, suggesting the NPU offload may be partial or the host driver stack is doing significant work
Verdict
Worth bookmarking if you’re already committed to Ascend hardware or operating under procurement constraints that rule out NVIDIA. Everyone else can safely ignore it—this is infrastructure for a specific silicon island, not a generic portability layer.