Is Datasets open source?

Yes — jbrownlee/Datasets is an open-source project tracked on heatdrop.

How popular is Datasets?

jbrownlee/Datasets has 1.2k stars on GitHub.

Where can I find Datasets?

jbrownlee/Datasets is on GitHub at https://github.com/jbrownlee/Datasets.

jbrownlee/Datasets

A backup plan for when UCI goes down

Curated machine learning datasets, normalized to one opinionated CSV format so tutorials don't break when third-party links rot.

★1.2k stars Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This repo hosts copies of classic ML datasets—UCI staples like Iris and Boston Housing, plus time series and NLP sets—used in Jason Brownlee’s MachineLearningMastery.com tutorials. Every classification and regression CSV follows the same rigid convention: no headers, no whitespace, target in the last column, missing values as ?. It’s a data janitor’s idea of a standard.

The interesting bit

The README admits the real motive plainly: third parties are “unreliable,” and tutorials link directly to raw URLs here. So filenames are frozen forever once added. It’s less a dataset collection than a permalink infrastructure with CSVs attached.

Key highlights

14 binary classification, 6 multiclass, 6 regression, 12 univariate time series, 5 multivariate time series, and 4 NLP datasets
Includes some harder-to-grab sets: credit card fraud (zipped), household power consumption, Flickr 8k captions
ARFF versions bundled for Weka holdouts
All CSVs pre-cleaned to identical structural conventions—drop in and run

Caveats

No code, no loaders, no documentation beyond the list itself; you’re on your own for train/test splits or feature names
Frozen filenames mean no versioning or updates if source datasets improve
Some entries (German Credit, Pima Indians Diabetes) use dated terminology

Verdict

Grab this if you’re following Brownlee’s tutorials or need frictionless, already-munged CSVs for quick experiments. Skip it if you want modern data loaders, up-to-date sources, or any context about what the columns actually mean.

Frequently asked

What is jbrownlee/Datasets?: Curated machine learning datasets, normalized to one opinionated CSV format so tutorials don't break when third-party links rot.
Is Datasets open source?: Yes — jbrownlee/Datasets is an open-source project tracked on heatdrop.
How popular is Datasets?: jbrownlee/Datasets has 1.2k stars on GitHub.
Where can I find Datasets?: jbrownlee/Datasets is on GitHub at https://github.com/jbrownlee/Datasets.