A museum of 120+ trained RL agents, now closed for renovation
Pre-trained Stable Baselines agents with tuned hyperparameters, though the zoo has moved to a new location.

What it does
This repository houses over 120 pre-trained reinforcement learning agents across Atari, classic control, Box2D, PyBullet, and MiniGrid environments. Each agent comes with tuned hyperparameters stored in YAML files, plus scripts to train new agents, watch existing ones (enjoy.py), or optimize hyperparameters via Optuna.
The interesting bit
The real value isn’t the agents themselves—it’s the hyperparameter matrix. The README documents which algorithm-environment pairs actually work (marked with ✓) versus which ones nobody has bothered to train yet. It’s a pragmatic map of where Stable Baselines succeeds and where it quietly fails.
Key highlights
- 120+ trained agents with benchmarked scores in
benchmark.md - Hyperparameter search via Optuna (though not for ACER or DQN)
- Support for environment wrappers and custom kwargs without code changes
- Docker images and a Colab notebook for immediate experimentation
- Video recording utility for agent demos
Caveats
- Explicitly unmaintained: the README banner redirects users to RL-Baselines3 Zoo, which uses Stable-Baselines3 instead
- Some algorithm-environment combinations are simply empty (TRPO on all Atari, most MiniGrid with non-PPO algorithms)
- PyBullet environments are noted as “much harder than the MuJoCo version” due to derivation from Roboschool
Verdict
Worth browsing if you’re stuck debugging why your PPO won’t learn on BipedalWalkerHardcore, or if you need historical baselines for comparison. Otherwise, follow the README’s own advice and head to the newer RL-Baselines3 Zoo.