One researcher's messy notebook of RL algorithms
A personal collection of 15+ reinforcement learning implementations, kept deliberately unpolished to show multiple versions and trade-offs.

What it does
This repo collects PyTorch implementations of mainstream model-free RL algorithms—SAC, TD3, PPO, DDPG, A2C, DQN, QMIX, and more—plus oddities like Soft Decision Trees and Probabilistic Mixture-of-Experts. It runs on OpenAI Gym and a custom Reacher environment. The author explicitly calls it a personal research notebook, not a library.
The interesting bit
Rather than one canonical implementation per algorithm, the repo keeps multiple versions side-by-side—two SAC variants, two PPO continuous versions, LSTM and GRU recurrent policies—so you can compare what actually changes. The README also surfaces “undervalued tricks”: reward normalization, advantage clipping, and the moving-average vs. batch normalization distinction that papers often skip.
Key highlights
- Covers discrete and continuous action spaces, on-policy and off-policy, single-agent and multi-agent (QMIX on PettingZoo)
- Includes less-common hybrids: PointNet/Transporter for image-based RL, Soft Decision Trees for “explainable RL,” PMOE for multi-modal policies
- Multiprocessing versions provided with honest notes about lock-free gradient sharing being “potentially unsafe”
- PPO implementation details summarized in a separate Google Doc; tricks chapter from the authors’ book linked
python script.py --train/--testinterface—no packaging, no abstractions
Caveats
- Code is explicitly “not cleaned or structured”; multiple versions per algorithm means clutter, not curation
- Several listed algorithms (PPG, MPO, AWR) are marked “todo”—paper links only, no implementation
- Gym version pinned to 0.7/0.10; 0.14 breaks with “Not implemented Error”
- Author recommends their own RLzoo or TensorLayer tutorials for “official library” use instead
Verdict
Good if you’re implementing RL from scratch and want to see how SAC v1 differs from v2, or why reward normalization matters for Pendulum. Skip if you need pip-installable, maintained code—this is a reference desk, not a product.