The missing data is the point
O'Reilly's feature engineering book ships its code, but you're on your own for the datasets.

What it does This repo holds the Jupyter notebooks for Alice Zheng and Amanda Casari’s 2018 O’Reilly book on feature engineering. It’s a companion, not a standalone course — you need the book to make sense of the code, and you’ll need to hunt down datasets yourself since licensing prevents redistribution.
The interesting bit The authors were upfront about the data gap rather than pretending everything’s self-contained. That honesty is rarer than you’d think in publishing-adjacent repos, where half-finished “community editions” often rot in limbo.
Key highlights
- 1,497 stars suggests the book itself has legs, even if the repo is bare-bones
- Jupyter Notebook format — run the examples, break them, fix them
- Published 2018, so expect scikit-learn patterns from the pre-transformer era
- Explicit data download instructions via the book’s URLs
Caveats
- No data in repo; broken external links would silently brick the notebooks
- README is two sentences and a shrug — don’t expect issue-tracker support
Verdict Grab it if you’re working through the book and want typed-out code to prod. Skip if you’re looking for a self-contained feature engineering tutorial; this is a reference implementation, not a curriculum.