One metric to rule four languages, mostly
A polyglot grab-bag of ML evaluation metrics that dares to ask: what if your AUC function had a Haskell accent?

What it does
Metrics is a reference implementation of common supervised-learning evaluation functions—AUC, log loss, RMSE, average precision at K, quadratic weighted kappa, and others—ported across Python, R, Haskell, and MATLAB/Octave. The pitch is consistency: the same named metric should behave the same way whether you’re in a Jupyter notebook or a GHCi session.
The interesting bit
The project treats language parity as a feature, not an afterthought. Most metric libraries are single-language; this one tries to keep a Rosetta Stone of implementations in sync, which is either admirably ambitious or a maintenance headache, depending on your worldview.
Key highlights
- Covers 20+ metrics, with near-universal coverage for the core set (MAE, MSE, AUC, log loss, etc.)
- Packaged for each ecosystem:
pip, CRAN, Cabal, and raw MATLAB path setup - Includes less common metrics like Quadratic Weighted Kappa and MAP@K alongside the usual suspects
- Explicitly labels itself beta, which is either honesty or a warning—possibly both
Caveats
- Coverage is uneven: F1 only exists in R, Gini only in MATLAB, and several metrics (precision/recall, cross-entropy, multiclass log loss) are still on the “to implement” list
- The Haskell and Python repos appear to be the testbed for the beta release; maturity varies by language
- No mention of vectorization, GPU support, or performance benchmarks—this is straightforward, probably loop-heavy reference code
Verdict
Worth a bookmark if you jump between R and Python and are tired of re-deriving MAP@K. Skip it if you need a single, battle-tested library with complete coverage; scikit-learn’s metrics module or MLmetrics in R are likely more polished for their respective ecosystems.