YouTube RL tutorials, now with actual code that trains
A grab-bag of reinforcement learning implementations with honest notes about what works, what doesn't, and how long it takes.

What it does
This is the companion repo for the “Machine Learning with Phil” YouTube channel — a collection of Python implementations covering CNNs, Deep Q-Learning, SARSA, and Monte Carlo methods. Each folder maps to a video, with code you can run and numbers you can verify (or fail to reproduce, which is also documented).
The interesting bit
The README includes the kind of frank asides most repos bury: the Deep Q-Learning model “takes quite some time even on my 1080Ti / i7 7820k @ 4.4 GHz” and hasn’t been fully trained yet. The Venus volcano CNN beats the naive baseline by ~4 percentage points on brutally imbalanced data. It’s refreshing transparency in a field that usually only reports the wins.
Key highlights
- Deep Q-Learning for Space Invaders in PyTorch (training status: “I’ll get to it”)
- CNN hitting 98% on MNIST in 10 epochs — TensorFlow 1.5 vintage
- Monte Carlo blackjack: 42% win rate on-policy, 29% off-policy (the exploration penalty is real)
- Q-Learning, Double Q-Learning, and SARSA all tested on CartPole
- Every script linked to a specific YouTube video and often a blog post
Caveats
- TensorFlow 1.5 code is present; some of this has aged into archaeology
- Several projects are explicitly unfinished or under-trained
- The blackjack off-policy result is notably worse than on-policy — worth understanding why before copying
Verdict
Good for someone learning RL who wants code tied to explanations and doesn’t mind filling in some gaps. Skip if you need production-ready, benchmark-topping implementations; the value here is pedagogical honesty, not SOTA results.