A census-taker's notebook: income prediction, step by step
A single Jupyter notebook that walks through the full sklearn pipeline on a classic dataset, with a Docker one-liner to get you running.

What it does This repo is one long Jupyter notebook that predicts whether someone earns more than $50K/year using the UCI Census Income dataset. It runs through the standard sklearn hits: exploration, imputation, encoding, feature ranking, then trains models with both sklearn and TensorFlow. A companion mindmap (separate repo) maps out the broader data science workflow.
The interesting bit
The Docker setup is almost aggressively simple—one docker run command and you’re at localhost:8888 with TensorFlow and Jupyter ready. For a field where environment setup eats half a day, that’s not nothing.
Key highlights
- Covers the full pipeline: univariate/bivariate exploration, imputation, selection, encoding, PCA, and model comparison
- Includes ROC curves and metric calculations (accuracy, precision, recall, f1) for algorithm comparison
- Designed to run on the official
jupyter/tensorflow-notebookDocker image - Companion mindmap/cheatsheet at
dformoso/machine-learning-mindmap - ~700 stars suggests it has served as a reference for others learning the sklearn workflow
Caveats
- The README is a walkthrough, not a library—expect copy-paste learning, not import-and-go code
- No requirements.txt or pip install instructions; Docker is the only documented path
- TensorFlow usage is mentioned but not detailed in the README; unclear if it’s a full alternative pipeline or a brief add-on
Verdict Good for someone who wants to see a complete, documented sklearn workflow in one place and prefers learning by running cells. Skip if you need modular, reusable code or are already comfortable building pipelines from scratch.