← all repositories
justmarkham/DAT8

A 2015 data science course, still quietly useful

Kevin Markham's complete General Assembly curriculum, open-sourced with notebooks, homework, and the occasional Chipotle dataset.

1.6k stars Jupyter Notebook Learning
DAT8
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does This repo holds the full materials for General Assembly’s 2015 Data Science course in Washington, DC — 22 classes spanning Python basics, command line, Git, pandas, visualization, and a full tour of scikit-learn from KNN through ensembling. It includes slides, code, homework assignments, and a structured course project.

The interesting bit The curriculum is built around real, slightly silly datasets — Chipotle orders, airline safety records — which turns out to be a solid pedagogical choice. The README also links to Markham’s broader Data School ecosystem (blog, newsletter, YouTube), making this feel less like a dead archive and more like a snapshot of an active teaching practice.

Key highlights

  • Covers the full soup-to-nuts pipeline: data cleaning, EDA, visualization, machine learning, NLP, web scraping, and regex
  • Includes model comparison and evaluation procedure guides as standalone references
  • Homework uses real-world data sources (FiveThirtyEight, NYT Upshot)
  • Binder badge for running notebooks without local setup
  • Python 2.7 era, with explicit Anaconda setup instructions

Caveats

  • Materials date to 2015; Python 2.7 and some library APIs will be outdated
  • Several sections of the README are commented out, suggesting incomplete migration or maintenance
  • Some homework links appear truncated in the source README

Verdict Self-learners who want a structured, no-cost intro to the full data science workflow will find a coherent path here. Those already comfortable with pandas and scikit-learn should skip it — this is genuinely beginner material, not a hidden gem of advanced techniques.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.