A research-grade zoo of RL algorithms, from DQN to obscure tree planners
A Python collection that pairs classic deep RL with a half-dozen niche planning algorithms you've probably never implemented.

What it does
This repo houses a modular set of reinforcement learning agents—both value-based (DQN variants, Fitted-Q) and planning-based (MCTS, value iteration, cross-entropy method). Everything plugs into standard OpenAI Gym environments via a clean act/record interface, with JSON configs filling in whatever hyperparameters you omit.
The interesting bit
The real depth is in the tree-search and safe planning sections. You’ll find implementations of optimistic planners (OPD, OLOP, Trailblazer, PlaTγPOOS) and robust variants that handle model uncertainty—stuff that rarely shows up in production RL frameworks but has solid academic lineage. The author, Edouard Leurent, has clearly been feeding his own papers into the codebase.
Key highlights
- DQN with Double, Dueling, and N-step bells and whistles
- Six MCTS/planning algorithms including lesser-known optimistic planners
- “Safe planning” agents: robust value iteration, interval-based robust planning for uncertain dynamics
- Benchmark runner that parallelizes experiments across processes
- TensorBoard, Gym Monitor, and metadata logging baked in for reproducibility
Caveats
- Several planning agents only work with
finite-mdpenvironments or requireenv.to_finite_mdp()conversion; the README doesn’t clarify how broadly this limits applicability - Installation is
pip installfrom GitHub—no PyPI release, which suggests casual maintenance - The Fitted-Q reference link is truncated in the README, and some agent descriptions are just a name and a paper citation with no usage guidance
Verdict
Grab this if you’re implementing or comparing against specific planning algorithms from the 2010s robust/optimistic literature. Skip it if you need a maintained, batteries-included RL framework for new projects—this is a research reference implementation, not a product.