← all repositories
gpleiss/temperature_scaling

Your neural network is lying about how sure it is

A dead-simple post-processing trick that learns a single scalar to stop softmax from being overconfident.

1.2k stars Python LLMOps · EvalML Frameworks
temperature_scaling
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

Temperature scaling is a one-parameter fix for a common pathology: neural networks output probabilities that sound more certain than they are. You learn a single scalar T on a held-out validation set, divide your logits by it, and suddenly your 80% confidence predictions are actually correct 80% of the time. The repo is one file (temperature_scaling.py) you copy into your project.

The interesting bit

The cleverness is in the restraint. No retraining, no architecture changes — just softmax = e^(z/T) / sum_i e^(z_i/T) where T is fit by minimizing negative log-likelihood. It’s post-hoc calibration: the model stays the same, only the thermometer changes.

Key highlights

  • Single learned parameter, fit on validation data in one pass
  • Works with any trained PyTorch classifier (wraps your existing model)
  • Based on Guo et al.’s “On Calibration of Modern Neural Networks” (ICML 2017)
  • Includes before/after calibration plots for ResNet on CIFAR-100
  • Author explicitly recommends better-maintained alternatives like probmetrics for production use

Caveats

  • Repo is unmaintained — written for PyTorch 0.3, eight years stale
  • Requires careful validation-set hygiene: must use the same validation set for training and calibration, or you leak information
  • Not a package; literally just a file to copy-paste

Verdict

Worth reading for the concept and the 50 lines of implementation, but don’t depend on it in production. Use it to understand temperature scaling, then switch to a maintained library.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.