← all repositories
openml/OpenML

The machine learning commons that predated Kaggle datasets

A 2013-vintage open-science platform for sharing ML experiments, datasets, and results—now being retired in favor of a FastAPI rewrite.

741 stars PHP Data Tooling
OpenML
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

OpenML is a collaborative platform for sharing machine learning datasets, algorithms, and experimental results. Scientists upload experiments; others search, compare, and build on them without re-running everything from scratch. It plugs into Python (scikit-learn), R, Java, and WEKA.

The interesting bit

The README’s “frictionless, networked ecosystem” pitch is almost quaint now—this is essentially an academic preprint server crossed with a dataset registry, launched before Hugging Face made model sharing trivial. The real value was always the social contract: cite the data, compare your run to the state of the art automatically, get credit for reusable work whether it was published or not.

Key highlights

  • REST API + web frontend in PHP (this repo), with client libraries for Python, R, Java, and WEKA
  • Designed for reproducibility: experiments link to datasets, flows, and prior results in a queryable graph
  • Explicitly targets “citizen scientists” and students alongside researchers
  • BSD-3-Clause licensed
  • Maintenance-only mode: the PHP stack is being phased out for a FastAPI-based replacement

Caveats

  • The “frictionless” claim is aspirational; the actual API ergonomics are unclear from the README
  • No usage stats, performance numbers, or active user counts provided
  • The PHP implementation’s rough edges are presumably why it’s being rewritten

Verdict

Worth a look if you’re researching open-science infrastructure or need historical ML benchmark data. Skip it if you want a modern, actively maintained platform—check the FastAPI successor instead, or just use Hugging Face Datasets.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.