A 2016 time capsule: how fast your GPU actually trains ResNet
Before PyTorch existed, someone had to prove that cuDNN was worth the install and Pascal cards were worth the money.

What it does
This repo benchmarks nine classic CNN architectures—AlexNet through ResNet-200—on CPUs and four NVIDIA GPUs, with and without cuDNN. All tests use a fixed batch size of 16 and 224×224 images, so the tables are directly comparable. The numbers are forward-plus-backward pass times in milliseconds.
The interesting bit
The README doubles as a hardware buying guide from 2016. The author draws explicit conclusions: Pascal Titan X beats GTX 1080 by ~1.4×, GTX 1080 edges out Maxwell Titan X by ~1.1×, and cuDNN provides 2–3× speedups across the board. A Pascal Titan X with cuDNN is stated to be 49–74× faster than dual Xeon E5-2630 v3 CPUs. These claims are grounded in the tables, not hand-waving.
Key highlights
- Covers AlexNet, Inception-V1, VGG-16/19, and ResNet-18/34/50/101/152/200
- Tests Pascal Titan X, GTX 1080, GTX 1080 Ti, Maxwell Titan X, and dual Xeon CPUs
- Separate forward and backward pass timings for each GPU/cuDNN combination
- Includes Top-1/Top-5 error rates for accuracy-speed tradeoff comparisons
- Model files and conversion scripts provided (2.1 GB download)
Caveats
- All benchmarks run in Torch, not modern PyTorch or TensorFlow
- CUDA 8.0 and cuDNN 5.0/5.1 are ancient; current speedups likely differ
- ResNet-200 failed on the 8 GB GTX 1080 due to memory limits
- VGG-16 and VGG-19 use dense prediction (256×256), giving them a slight accuracy advantage versus single-crop for other models
Verdict
Worth a look if you’re researching historical hardware scaling laws or defending a 2016 GPU purchase. Skip it if you need current PyTorch benchmarks—this is a period piece, not a living benchmark suite.