← all repositories
toshas/torch-fidelity

Finally, FID scores you can actually trust

A PyTorch-native toolkit that computes standard generative-model metrics with reference-implementation precision—no more wondering if your FID is off because of implementation drift.

1.2k stars Python LLMOps · EvalML Frameworks
torch-fidelity
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

torch-fidelity bundles the standard metrics for evaluating generative models—Inception Score, Fréchet Inception Distance, Kernel Inception Distance, Precision/Recall, Perceptual Path Length, and the newer Monge Inception Distance—into a single PyTorch package. It exposes both a command-line tool and a Python API, and ships with built-in dataset references like cifar10-train so you don’t have to wrangle baseline statistics yourself.

The interesting bit

The selling point is numerical fidelity: the authors claim their outputs match reference implementations up to machine-precision floating-point error. In a field where tiny implementation differences have sent paper reviewers into spirals, that’s not nothing. The efficiency angle is practical too—shared feature computation and aggressive caching mean you can drop metric evaluation into a training loop without tanking your epoch time.

Key highlights

  • Six metrics in one call: ISC, FID, KID, PRC, PPL, and MIND
  • CLI handles directories, ONNX generators, or built-in dataset references directly
  • Python API wraps generators with GenerativeModelModuleWrapper for drop-in training-loop use
  • Feature sharing + caching avoids recomputing Inception features across metrics
  • Modular enough to register custom feature extractors for video, audio, 3D, etc.

Caveats

  • The “machine precision” claim is stated but not independently verified in the README; you’ll need to check their precision docs or run your own comparisons
  • The troubleshooting section notes a PATH issue with the fidelity CLI script on some installs

Verdict

Worth a look if you’re training GANs or diffusion models in PyTorch and tired of stitching together half a dozen reference implementations. Less useful if you just need a one-off FID score and don’t care about reproducibility down to the last decimal.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.