Stable-Baselines3's messy garage: where experimental RL lives
A holding pen for reinforcement learning algorithms too fresh or too niche for the main library.

What it does
SB3-Contrib is the unofficial annex to Stable-Baselines3. It houses RL algorithms and tools that aren’t ready for—or don’t fit—the polished core library. Think of it as a researcher’s workshop with the same API conventions but looser admission standards.
The interesting bit
The project explicitly embraces mess. The maintainers admit these utilities are “too niche” or “too difficult to integrate well” into the main codebase, yet they still enforce documentation and code style. It’s a rare admission that not everything needs to be production-grade to be worth sharing.
Key highlights
- Seven RL algorithms including ARS, QR-DQN, MaskablePPO, RecurrentPPO, TQC, TRPO, and CrossQ
- One Gym wrapper: Time Feature Wrapper
- Same API patterns as Stable-Baselines3, so swapping between main and contrib is mostly painless
- Requires the
masterversion of the main SB3 library, not just the PyPI release - Active CI and black code formatting, despite the “experimental” label
Caveats
- “Almost everything remotely useful goes” — the bar is deliberately low, so quality varies
- Experimental status means APIs may shift without the stability guarantees of the main library
Verdict
Worth a look if you’re reproducing a recent RL paper or need an oddball algorithm like invalid-action masking. Skip it if you want battle-tested defaults; the main Stable-Baselines3 repo already covers PPO, SAC, DQN, and friends.