A summer camp for data scientists who want to do more than optimize ad clicks
Open curriculum from a fellowship that trains researchers to build ML/AI projects with government agencies and nonprofits.

What it does
This repo is the public handbook and curriculum for the Data Science for Social Good Fellowship, a project-based summer program run since 2013 (now through CMU and the DSSG Foundation). It collects tutorials, manuals, and workshop materials used to train fellows who work directly with public-sector partners on education, health, criminal justice, and energy projects. The content is split between a fellow manual (orientation, conduct, schedules) and a broad curriculum covering everything from web scraping and reproducible ETL to causal inference, ethics in ML for public policy, and presentation skills.
The interesting bit
The guide treats “data scientist for social good” as a distinct role requiring social science literacy, project scoping ability, and explicit ethics training—not just modeling chops. The curriculum is CC-BY licensed and explicitly welcomes outsiders, not just admitted fellows.
Key highlights
- Covers the full stack: Python/SQL, GIS, record linkage, network analysis, operations research, plus “living in the terminal” and dotfiles
- Includes unusual public-sector topics: data security primers, reproducible ETL, causal inference, and ethics/fairness/bias sessions
- Provides orientation schedules from 2016 and 2022, plus a high-level summer plan, for anyone replicating the program
- Wiki covers practical infrastructure: S3 access, Jupyter on EC2, SQL Server to Postgres, killing runaway queries
- Built with mkdocs and served via GitHub Pages; contribution workflow is documented
Caveats
- The mkdocs build currently requires a
$(pwd)workaround due to a dependency bug - Most curriculum content is linked as subdirectories or PDFs; depth and freshness of individual tutorials varies and isn’t audited here
Verdict
Worth bookmarking if you work at the intersection of ML and public policy, or if you’re designing a similar fellowship. Purely commercial data scientists will find the ethics and scoping material more relevant than the technical tutorials, which overlap with standard bootcamp content.