A backup plan for when UCI goes down
Curated machine learning datasets, normalized to one opinionated CSV format so tutorials don't break when third-party links rot.

What it does
This repo hosts copies of classic ML datasets—UCI staples like Iris and Boston Housing, plus time series and NLP sets—used in Jason Brownlee’s MachineLearningMastery.com tutorials. Every classification and regression CSV follows the same rigid convention: no headers, no whitespace, target in the last column, missing values as ?. It’s a data janitor’s idea of a standard.
The interesting bit
The README admits the real motive plainly: third parties are “unreliable,” and tutorials link directly to raw URLs here. So filenames are frozen forever once added. It’s less a dataset collection than a permalink infrastructure with CSVs attached.
Key highlights
- 14 binary classification, 6 multiclass, 6 regression, 12 univariate time series, 5 multivariate time series, and 4 NLP datasets
- Includes some harder-to-grab sets: credit card fraud (zipped), household power consumption, Flickr 8k captions
- ARFF versions bundled for Weka holdouts
- All CSVs pre-cleaned to identical structural conventions—drop in and run
Caveats
- No code, no loaders, no documentation beyond the list itself; you’re on your own for train/test splits or feature names
- Frozen filenames mean no versioning or updates if source datasets improve
- Some entries (German Credit, Pima Indians Diabetes) use dated terminology
Verdict
Grab this if you’re following Brownlee’s tutorials or need frictionless, already-munged CSVs for quick experiments. Skip it if you want modern data loaders, up-to-date sources, or any context about what the columns actually mean.