A 2015 data science course, still quietly useful
Kevin Markham's complete General Assembly curriculum, open-sourced with notebooks, homework, and the occasional Chipotle dataset.

What it does This repo holds the full materials for General Assembly’s 2015 Data Science course in Washington, DC — 22 classes spanning Python basics, command line, Git, pandas, visualization, and a full tour of scikit-learn from KNN through ensembling. It includes slides, code, homework assignments, and a structured course project.
The interesting bit The curriculum is built around real, slightly silly datasets — Chipotle orders, airline safety records — which turns out to be a solid pedagogical choice. The README also links to Markham’s broader Data School ecosystem (blog, newsletter, YouTube), making this feel less like a dead archive and more like a snapshot of an active teaching practice.
Key highlights
- Covers the full soup-to-nuts pipeline: data cleaning, EDA, visualization, machine learning, NLP, web scraping, and regex
- Includes model comparison and evaluation procedure guides as standalone references
- Homework uses real-world data sources (FiveThirtyEight, NYT Upshot)
- Binder badge for running notebooks without local setup
- Python 2.7 era, with explicit Anaconda setup instructions
Caveats
- Materials date to 2015; Python 2.7 and some library APIs will be outdated
- Several sections of the README are commented out, suggesting incomplete migration or maintenance
- Some homework links appear truncated in the source README
Verdict Self-learners who want a structured, no-cost intro to the full data science workflow will find a coherent path here. Those already comfortable with pandas and scikit-learn should skip it — this is genuinely beginner material, not a hidden gem of advanced techniques.