← all repositories
scikit-learn-contrib/metric-learn

Scikit-learn's missing distance-learning toolbox

A scikit-learn-compatible library that learns distance metrics from data, not just applies them.

1.4k stars Python ML Frameworks
metric-learn
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

metric-learn implements ten supervised and weakly-supervised metric learning algorithms—LMNN, ITML, SDML, NCA, and others—that learn a distance function tailored to your data. It plugs into scikit-learn’s Pipeline, GridSearchCV, and friends via the familiar fit/transform API.

The interesting bit

The value isn’t the algorithms themselves (most are decades old), but the integration: you get metric learning inside scikit-learn’s ecosystem without writing glue code. The library also handles the subtle distinction between learning a metric from full labels versus weaker constraints like pairs or triplets.

Key highlights

  • 10 algorithms: LMNN, ITML, SDML, LSML, SCML, NCA, LFDA, RCA, MLKR, MMC
  • Full scikit-learn compatibility: pipelining, model selection, cross-validation
  • Published in JMLR (2020), suggesting academic credibility
  • Available via conda-forge and PyPI; Python 3.6+
  • Optional skggm dependency for SDML edge cases

Caveats

  • SDML requires installing skggm from a specific GitHub commit for “problematic cases”—the README doesn’t define what those cases are
  • No visual examples or screenshots in the README; documentation lives elsewhere
  • “Efficient” is claimed but no benchmarks or complexity notes are provided

Verdict

Worth a look if you’re doing nearest-neighbor search, clustering, or kernel methods and suspect a learned distance could beat Euclidean. Skip it if you just need standard metrics or already have a custom metric-learning pipeline you’re happy with.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.