PyTorch's official benchmark suite: real models, real noise, real numbers
A curated collection of popular workloads standardized so you can actually compare PyTorch performance across versions without writing your own harness.

What it does TorchBench packages copies of real-world models (BERT, Stable Diffusion, ResNet, etc.) with a uniform API, miniature datasets, and dependency scripts. The goal is straightforward: run the same code against different PyTorch builds and get comparable numbers. It also includes utilities to reduce benchmark noise on specific AWS hardware.
The interesting bit The project doesn’t just wrap models—it enforces a strict “same build process” rule for torch, torchvision, and torchaudio (no mixing pip and conda), because binary extension mismatches have apparently bitten enough people to warrant a README warning. There’s also an automated machine-tuning script for AWS g4dn.metal instances, acknowledging that kernel interrupts and CPU frequency scaling can swamp your signal.
Key highlights
- Models expose
train/evalandcpu/cudavia a standardBenchmarkModelAPI - Can install as a library (
pip install git+https://...) and import models directly - Multiple runners:
test.pyfor sanity checks,pytest test_bench.pyfor statistics,run.pyfor quick profiling, anduserbenchmarkfor custom suites - Nightly CI publishes V0 and V1 performance scores against PyTorch nightlies (stable releases explicitly not tested)
- Includes machine-config tuning for Amazon Linux on AWS g4dn.metal; Ubuntu automation is noted as future work
Caveats
- Stable PyTorch releases are explicitly not tested or maintained; nightlies are the expected target
- Automated low-noise setup only supports one AWS instance type with Amazon Linux; bring your own tuning elsewhere
test_bench.pyis slated for deprecation in favor ofuserbenchmark
Verdict Worth cloning if you’re a PyTorch core developer, compiler engineer, or infrastructure maintainer who needs reproducible performance regression signals. Skip it if you just want to benchmark your own model—this is for testing PyTorch itself, not arbitrary user code.