← all repositories
quantumiracle/Popular-RL-Algorithms

One researcher's messy notebook of RL algorithms

A personal collection of 15+ reinforcement learning implementations, kept deliberately unpolished to show multiple versions and trade-offs.

1.3k stars Jupyter Notebook ML FrameworksAgents
Popular-RL-Algorithms
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

This repo collects PyTorch implementations of mainstream model-free RL algorithms—SAC, TD3, PPO, DDPG, A2C, DQN, QMIX, and more—plus oddities like Soft Decision Trees and Probabilistic Mixture-of-Experts. It runs on OpenAI Gym and a custom Reacher environment. The author explicitly calls it a personal research notebook, not a library.

The interesting bit

Rather than one canonical implementation per algorithm, the repo keeps multiple versions side-by-side—two SAC variants, two PPO continuous versions, LSTM and GRU recurrent policies—so you can compare what actually changes. The README also surfaces “undervalued tricks”: reward normalization, advantage clipping, and the moving-average vs. batch normalization distinction that papers often skip.

Key highlights

  • Covers discrete and continuous action spaces, on-policy and off-policy, single-agent and multi-agent (QMIX on PettingZoo)
  • Includes less-common hybrids: PointNet/Transporter for image-based RL, Soft Decision Trees for “explainable RL,” PMOE for multi-modal policies
  • Multiprocessing versions provided with honest notes about lock-free gradient sharing being “potentially unsafe”
  • PPO implementation details summarized in a separate Google Doc; tricks chapter from the authors’ book linked
  • python script.py --train / --test interface—no packaging, no abstractions

Caveats

  • Code is explicitly “not cleaned or structured”; multiple versions per algorithm means clutter, not curation
  • Several listed algorithms (PPG, MPO, AWR) are marked “todo”—paper links only, no implementation
  • Gym version pinned to 0.7/0.10; 0.14 breaks with “Not implemented Error”
  • Author recommends their own RLzoo or TensorLayer tutorials for “official library” use instead

Verdict

Good if you’re implementing RL from scratch and want to see how SAC v1 differs from v2, or why reward normalization matters for Pendulum. Skip if you need pip-installable, maintained code—this is a reference desk, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.