A thousand-star study notebook nobody asked for
A grab-bag of Kaggle entries, scrapers, and half-finished experiments that accidentally became a popular reference.

What it does This repo collects one developer’s solutions to HackerRank algorithm puzzles and Kaggle machine-learning competitions, plus a few side projects: bike-share demand forecasting, NYC taxi trip duration prediction, Amazon deforestation tracking from satellite imagery, IMDB rating prediction, a Selenium-based Twitter scraper, and a dlib-powered face recognizer that needs only 20 photos per person.
The interesting bit The Twitter parser is the most opinionated piece: it bypasses the official API’s two-week limit by driving PhantomJS with Selenium, then emails the owner when new machine-learning flash cards drop. That’s not data science, that’s personal automation with extra steps — and it’s the only project here with a clear user.
Key highlights
- Kaggle entries span regression (bike sharing, taxi trips), computer vision (Amazon satellite labeling with Keras), and NLP-adjacent scraping (movie ratings)
- Face recognition leans on the dlib C++ library with HOG features, not a from-scratch neural net
- One project includes a French-language explanatory PDF, suggesting these were actual coursework or self-study artifacts
- Soccer data project is explicitly aimless: “I don’t know currently what’s the aim”
Caveats
- Several projects are described as incomplete or exploratory; the soccer parser has no stated goal
- PhantomJS is deprecated; the Twitter scraper may need headless Chrome or similar to function today
- No tests, no requirements files visible, no reproducibility infrastructure mentioned
Verdict Worth a quick browse if you’re stuck on a classic Kaggle starter competition and want to see one person’s notebook style. Skip it if you need production-ready code or novel techniques — this is a learning diary, not a framework.