PyTorch notebooks that actually explain RL papers
A personal learning project that grew into a readable reference for fourteen major deep reinforcement learning algorithms.

What it does
Fourteen Jupyter notebooks, each implementing one landmark deep RL paper in PyTorch. The author built them to understand the research, and the code favors readability over speed. Every notebook pairs implementation with markup explaining what each chunk does.
The interesting bit
This is essentially a curated reading list with executable homework attached. The progression is deliberate: vanilla DQN → multi-step learning → double Q-learning → dueling networks → noisy nets → prioritized replay → categorical DQN → Rainbow → quantile regression → recurrent Q-learning → A2C → GAE → PPO. You watch the field accrete improvements notebook by notebook.
Key highlights
- Covers both value-based (DQN variants) and policy-gradient (A2C, PPO) methods
- Explicitly credits borrowed code from OpenAI Baselines, Higgsfield, Kaixhin, and Kostrikov
- Each notebook links directly to its source paper
- Targets PyTorch 0.4.0 and Python 3.6 (period piece, by now)
- OpenAI Gym environments throughout
Caveats
- PyTorch 0.4.0 is ancient; expect friction running this on modern setups
- Author warns some choices sacrifice efficiency for clarity
- Acknowledgements suggest significant code is adapted rather than original
Verdict
Good for someone who has read the RL papers but never seen them implemented. Skip it if you need production-ready code or current PyTorch versions.