ML experiment tracking that lets you commit to Git after the fact
Keepsake versions your training runs on S3/GCS so you can stop spreadsheet-juggling and actually reproduce results later.

What it does
Keepsake is a Python library that snapshots your training code, hyperparameters, model weights, metrics, and even Python dependencies to Amazon S3 or Google Cloud Storage. Two lines in your training loop — keepsake.init() and experiment.checkpoint() — and it handles the rest. You query everything back via CLI or notebook.
The interesting bit
The “commit to Git after the fact” workflow is genuinely clever. You don’t need clean Git history during messy experimentation; Keepsake lets you checkout any checkpoint’s exact code and weights once you’ve found something worth keeping. It’s version control with the safety net finally on the right side of the tightrope.
Key highlights
- Stores everything as plain files on your own S3/GCS bucket — no server to run, no vendor lock-in
- CLI supports filtering experiments (
--filter "val_loss<0.2") and diffing checkpoints down to dependency versions - Notebook integration for retrieval, analysis, and plotting — described as a “programmable Tensorboard”
- Framework-agnostic: works with PyTorch, TensorFlow, scikit-learn, XGBoost, or anything that saves files
- Can load production models directly from stored experiments with full provenance
Caveats
- Not actively maintained — the README opens with a call for maintainers (issue #873)
- Cloud storage only (S3 or GCS); no local-first or other backend options are mentioned
- The “works with everything” claim is technically true but also means it’s just file-and-dict storage — you’re not getting framework-native integrations
Verdict
Worth a look if you’re currently duct-taping shell scripts to track ML experiments and want something open-source with your own storage. Skip it if you need active maintenance guarantees or deeply integrated experiment orchestration — this is a project that needs community help to survive.