← all repositories
philtabor/Youtube-Code-Repository

YouTube RL tutorials, now with actual code that trains

A grab-bag of reinforcement learning implementations with honest notes about what works, what doesn't, and how long it takes.

932 stars Python LearningML FrameworksAgents
Youtube-Code-Repository
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This is the companion repo for the “Machine Learning with Phil” YouTube channel — a collection of Python implementations covering CNNs, Deep Q-Learning, SARSA, and Monte Carlo methods. Each folder maps to a video, with code you can run and numbers you can verify (or fail to reproduce, which is also documented).

The interesting bit

The README includes the kind of frank asides most repos bury: the Deep Q-Learning model “takes quite some time even on my 1080Ti / i7 7820k @ 4.4 GHz” and hasn’t been fully trained yet. The Venus volcano CNN beats the naive baseline by ~4 percentage points on brutally imbalanced data. It’s refreshing transparency in a field that usually only reports the wins.

Key highlights

  • Deep Q-Learning for Space Invaders in PyTorch (training status: “I’ll get to it”)
  • CNN hitting 98% on MNIST in 10 epochs — TensorFlow 1.5 vintage
  • Monte Carlo blackjack: 42% win rate on-policy, 29% off-policy (the exploration penalty is real)
  • Q-Learning, Double Q-Learning, and SARSA all tested on CartPole
  • Every script linked to a specific YouTube video and often a blog post

Caveats

  • TensorFlow 1.5 code is present; some of this has aged into archaeology
  • Several projects are explicitly unfinished or under-trained
  • The blackjack off-policy result is notably worse than on-policy — worth understanding why before copying

Verdict

Good for someone learning RL who wants code tied to explanations and doesn’t mind filling in some gaps. Skip if you need production-ready, benchmark-topping implementations; the value here is pedagogical honesty, not SOTA results.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.