← all repositories
ptyadana/Data-Science-and-Machine-Learning-Projects-Dojo

A syllabus masquerading as a repo: 591 stars of ML practice

One developer's accumulated coursework and weekend projects, catalogued with unusual thoroughness.

592 stars Jupyter Notebook Learning
Data-Science-and-Machine-Learning-Projects-Dojo
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does This is a personal archive of data science and machine learning exercises—dozens of Jupyter notebooks spanning classification, regression, clustering, NLP, and deep learning. The author has walked through standard datasets (Wisconsin breast cancer, UCI heart disease, Kaggle bulldozer prices) using the predictable toolkit: pandas, scikit-learn, TensorFlow/Keras, plus visualization with matplotlib, seaborn, and Plotly. A few projects get Streamlit or Flask wrappers to become “apps.”

The interesting bit The value isn’t novelty; it’s curation. The README catalogs every project with its source course (Jose Portilla’s masterclass, Zero to Mastery, etc.) and exact algorithm used—down to “PCA Manual Implementation” versus “PCA with sklearn.” For someone drowning in identical-looking ML tutorials, this is a surprisingly usable index of what one actually learns from them.

Key highlights

  • Covers the full textbook spread: SVM, random forest, XGBoost, AdaBoost, gradient boosting, KNN, K-Means, DBSCAN, hierarchical clustering, naive Bayes, logistic/linear/multiple regression, ANNs, transfer learning
  • Includes a few non-standard touches: dlib face recognition, GeoPandas, Apache Spark/Databricks, manual PCA math
  • One external project linked: a separate Streamlit app for random forest regression
  • Datasets are all public (UCI, Kaggle, FiveThirtyEight)—reproducible without hunting

Caveats

  • Most projects are explicitly from paid courses; this is practice work, not original research
  • README is a flat list with no running code or unified environment; you’ll be installing dependencies per notebook
  • Some notebook links appear to be relative paths that may drift as the repo structure changes

Verdict Good if you’re self-teaching ML and want to see what “done” looks like across a curriculum—especially for comparing algorithmic approaches to the same problem types. Skip if you need production code or novel techniques; this is a study hall, not a lab.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.