← all repositories
eleurent/rl-agents

A research-grade zoo of RL algorithms, from DQN to obscure tree planners

A Python collection that pairs classic deep RL with a half-dozen niche planning algorithms you've probably never implemented.

724 stars Python AgentsML Frameworks
rl-agents
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo houses a modular set of reinforcement learning agents—both value-based (DQN variants, Fitted-Q) and planning-based (MCTS, value iteration, cross-entropy method). Everything plugs into standard OpenAI Gym environments via a clean act/record interface, with JSON configs filling in whatever hyperparameters you omit.

The interesting bit

The real depth is in the tree-search and safe planning sections. You’ll find implementations of optimistic planners (OPD, OLOP, Trailblazer, PlaTγPOOS) and robust variants that handle model uncertainty—stuff that rarely shows up in production RL frameworks but has solid academic lineage. The author, Edouard Leurent, has clearly been feeding his own papers into the codebase.

Key highlights

  • DQN with Double, Dueling, and N-step bells and whistles
  • Six MCTS/planning algorithms including lesser-known optimistic planners
  • “Safe planning” agents: robust value iteration, interval-based robust planning for uncertain dynamics
  • Benchmark runner that parallelizes experiments across processes
  • TensorBoard, Gym Monitor, and metadata logging baked in for reproducibility

Caveats

  • Several planning agents only work with finite-mdp environments or require env.to_finite_mdp() conversion; the README doesn’t clarify how broadly this limits applicability
  • Installation is pip install from GitHub—no PyPI release, which suggests casual maintenance
  • The Fitted-Q reference link is truncated in the README, and some agent descriptions are just a name and a paper citation with no usage guidance

Verdict

Grab this if you’re implementing or comparing against specific planning algorithms from the 2010s robust/optimistic literature. Skip it if you need a maintained, batteries-included RL framework for new projects—this is a research reference implementation, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.