← all repositories
alicezheng/feature-engineering-book

The missing data is the point

O'Reilly's feature engineering book ships its code, but you're on your own for the datasets.

1.5k stars Jupyter Notebook LearningData Tooling
feature-engineering-book
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does This repo holds the Jupyter notebooks for Alice Zheng and Amanda Casari’s 2018 O’Reilly book on feature engineering. It’s a companion, not a standalone course — you need the book to make sense of the code, and you’ll need to hunt down datasets yourself since licensing prevents redistribution.

The interesting bit The authors were upfront about the data gap rather than pretending everything’s self-contained. That honesty is rarer than you’d think in publishing-adjacent repos, where half-finished “community editions” often rot in limbo.

Key highlights

  • 1,497 stars suggests the book itself has legs, even if the repo is bare-bones
  • Jupyter Notebook format — run the examples, break them, fix them
  • Published 2018, so expect scikit-learn patterns from the pre-transformer era
  • Explicit data download instructions via the book’s URLs

Caveats

  • No data in repo; broken external links would silently brick the notebooks
  • README is two sentences and a shrug — don’t expect issue-tracker support

Verdict Grab it if you’re working through the book and want typed-out code to prod. Skip if you’re looking for a self-contained feature engineering tutorial; this is a reference implementation, not a curriculum.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.