← all repositories
mpatacchiola/dissecting-reinforcement-learning

Reinforcement learning, dissected like a frog in biology class

A blog-turned-repo that walks from Markov chains to policy gradients with nothing but NumPy and patience.

624 stars Python LearningML Frameworks
dissecting-reinforcement-learning
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This is the companion code for an eight-part blog series by Massimiliano Patacchiola. It covers the full reinforcement learning curriculum—Markov chains, Q-learning, SARSA, actor-critic, genetic algorithms, function approximation, neural networks, and policy gradients—implemented in plain Python with only NumPy and Matplotlib as dependencies. Each post has matching code in src/, a printable A3 PDF in pdf/, and raw SVG assets in images/.

The interesting bit

The environments are deliberately self-contained: single Python files you copy into your project, no OpenAI Gym installation required. They follow Gym’s reset()/step()/render() convention anyway, which is a quiet bit of API diplomacy that lowers the friction for beginners who will inevitably graduate to the real thing.

Key highlights

  • Eight progressive posts from 2016–2018, covering theory through neural policy gradients
  • Standalone environments: grid world, multi-armed bandit, inverted pendulum, mountain car, drone landing
  • Renders episodes to GIF/MP4 via Matplotlib, no external simulators needed
  • Runs on Raspberry Pi, BeagleBone, Intel Edison—anything that runs NumPy
  • Curated bibliography including Sutton & Barto (both editions), Watkins’s Q-learning dissertation, and classic papers

Caveats

  • The code is explicitly educational, not optimized for speed or modern deep RL scale
  • Some resource links are dated (OpenAI Universe is defunct, the Sutton & Barto second edition link is marked “[TODO]”)
  • No automated tests or CI visible in the repository structure

Verdict

Worth bookmarking if you’re teaching yourself RL from first principles or need lecture materials with working, readable code. Skip it if you want production-grade implementations or the latest PyTorch/JAX abstractions.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.