← all repositories
pytorch/benchmark

PyTorch's official benchmark suite: real models, real noise, real numbers

A curated collection of popular workloads standardized so you can actually compare PyTorch performance across versions without writing your own harness.

1k stars Python ML FrameworksLLMOps · Eval
benchmark
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does TorchBench packages copies of real-world models (BERT, Stable Diffusion, ResNet, etc.) with a uniform API, miniature datasets, and dependency scripts. The goal is straightforward: run the same code against different PyTorch builds and get comparable numbers. It also includes utilities to reduce benchmark noise on specific AWS hardware.

The interesting bit The project doesn’t just wrap models—it enforces a strict “same build process” rule for torch, torchvision, and torchaudio (no mixing pip and conda), because binary extension mismatches have apparently bitten enough people to warrant a README warning. There’s also an automated machine-tuning script for AWS g4dn.metal instances, acknowledging that kernel interrupts and CPU frequency scaling can swamp your signal.

Key highlights

  • Models expose train/eval and cpu/cuda via a standard BenchmarkModel API
  • Can install as a library (pip install git+https://...) and import models directly
  • Multiple runners: test.py for sanity checks, pytest test_bench.py for statistics, run.py for quick profiling, and userbenchmark for custom suites
  • Nightly CI publishes V0 and V1 performance scores against PyTorch nightlies (stable releases explicitly not tested)
  • Includes machine-config tuning for Amazon Linux on AWS g4dn.metal; Ubuntu automation is noted as future work

Caveats

  • Stable PyTorch releases are explicitly not tested or maintained; nightlies are the expected target
  • Automated low-noise setup only supports one AWS instance type with Amazon Linux; bring your own tuning elsewhere
  • test_bench.py is slated for deprecation in favor of userbenchmark

Verdict Worth cloning if you’re a PyTorch core developer, compiler engineer, or infrastructure maintainer who needs reproducible performance regression signals. Skip it if you just want to benchmark your own model—this is for testing PyTorch itself, not arbitrary user code.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.