A thousand-star recipe for the classic data-science portfolio
A curated collection of Jupyter notebooks and R Markdown files covering the standard ML curriculum, useful mostly as a reference for how to structure your own.

What it does This repo is a personal portfolio of data science projects—machine learning, NLP, data analysis, and visualization—built in Jupyter notebooks and R Markdown. It covers the usual suspects: Boston housing prices, Titanic survival, MNIST digit recognition, sentiment analysis, and a smattering of Kaggle-style exploratory work. The author also maintains a separate site for prettier browsing.
The interesting bit The portfolio is deliberately bilingual, splitting work between Python (scikit-learn, Keras, Pandas) and R (published via RPubs). That alone makes it a decent reference for anyone straddling both ecosystems. The “Disaster Message Classifier” stands out as a fuller-stack project, with an ETL pipeline, ML pipeline, and a Flask web app with Plotly visualizations—most entries are narrower notebook demos.
Key highlights
- Covers supervised, unsupervised, reinforcement, and deep learning in one repo
- Includes a cross-language information retrieval system (German queries, English documents)
- R work is published externally at RPubs, not buried in the repo
- “Micro Projects” section isolates single-algorithm walkthroughs (logistic regression, KNN, random forests)
- Requirements.txt provided for local setup; data flagged as demonstration-only
Caveats
- Several projects are explicitly labeled “very simple analysis” or “micro”—depth varies sharply
- No visible tests, CI, or reproducibility infrastructure beyond a requirements file
- Some R content lives entirely outside the repo; you’ll need to chase links
Verdict Good for early-career data scientists figuring out how to organize a portfolio, or hiring managers wanting a quick scan of a candidate’s range. Skip it if you’re after production code or novel methods—this is show-and-tell, not a framework.