← all repositories
sajal2692/data-science-portfolio

A thousand-star recipe for the classic data-science portfolio

A curated collection of Jupyter notebooks and R Markdown files covering the standard ML curriculum, useful mostly as a reference for how to structure your own.

1.2k stars Jupyter Notebook LearningML Frameworks
data-science-portfolio
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does This repo is a personal portfolio of data science projects—machine learning, NLP, data analysis, and visualization—built in Jupyter notebooks and R Markdown. It covers the usual suspects: Boston housing prices, Titanic survival, MNIST digit recognition, sentiment analysis, and a smattering of Kaggle-style exploratory work. The author also maintains a separate site for prettier browsing.

The interesting bit The portfolio is deliberately bilingual, splitting work between Python (scikit-learn, Keras, Pandas) and R (published via RPubs). That alone makes it a decent reference for anyone straddling both ecosystems. The “Disaster Message Classifier” stands out as a fuller-stack project, with an ETL pipeline, ML pipeline, and a Flask web app with Plotly visualizations—most entries are narrower notebook demos.

Key highlights

  • Covers supervised, unsupervised, reinforcement, and deep learning in one repo
  • Includes a cross-language information retrieval system (German queries, English documents)
  • R work is published externally at RPubs, not buried in the repo
  • “Micro Projects” section isolates single-algorithm walkthroughs (logistic regression, KNN, random forests)
  • Requirements.txt provided for local setup; data flagged as demonstration-only

Caveats

  • Several projects are explicitly labeled “very simple analysis” or “micro”—depth varies sharply
  • No visible tests, CI, or reproducibility infrastructure beyond a requirements file
  • Some R content lives entirely outside the repo; you’ll need to chase links

Verdict Good for early-career data scientists figuring out how to organize a portfolio, or hiring managers wanting a quick scan of a candidate’s range. Skip it if you’re after production code or novel methods—this is show-and-tell, not a framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.