Scikit-learn's missing distance-learning toolbox
A scikit-learn-compatible library that learns distance metrics from data, not just applies them.

What it does
metric-learn implements ten supervised and weakly-supervised metric learning algorithms—LMNN, ITML, SDML, NCA, and others—that learn a distance function tailored to your data. It plugs into scikit-learn’s Pipeline, GridSearchCV, and friends via the familiar fit/transform API.
The interesting bit
The value isn’t the algorithms themselves (most are decades old), but the integration: you get metric learning inside scikit-learn’s ecosystem without writing glue code. The library also handles the subtle distinction between learning a metric from full labels versus weaker constraints like pairs or triplets.
Key highlights
- 10 algorithms: LMNN, ITML, SDML, LSML, SCML, NCA, LFDA, RCA, MLKR, MMC
- Full scikit-learn compatibility: pipelining, model selection, cross-validation
- Published in JMLR (2020), suggesting academic credibility
- Available via conda-forge and PyPI; Python 3.6+
- Optional skggm dependency for SDML edge cases
Caveats
- SDML requires installing skggm from a specific GitHub commit for “problematic cases”—the README doesn’t define what those cases are
- No visual examples or screenshots in the README; documentation lives elsewhere
- “Efficient” is claimed but no benchmarks or complexity notes are provided
Verdict
Worth a look if you’re doing nearest-neighbor search, clustering, or kernel methods and suspect a learned distance could beat Euclidean. Skip it if you just need standard metrics or already have a custom metric-learning pipeline you’re happy with.