Teaching traffic lights to think with reinforcement learning
A Gymnasium/PettingZoo wrapper that turns SUMO traffic simulations into RL environments for signal control.

What it does
SUMO-RL wraps the SUMO traffic simulator so you can train RL agents to control traffic lights. It exposes single-intersection problems as standard Gymnasium environments and multi-light grids as PettingZoo multi-agent environments. The heavy lifting—vehicle physics, lane logic, phase transitions—is still SUMO’s; this library handles the translation layer.
The interesting bit
The reward function is deliberately simple: change in cumulative vehicle delay from one step to the next. That means your agent gets positive feedback for reducing total waiting time, negative for making it worse. You can swap in custom observation functions or rewards by inheriting from ObservationFunction or passing a Python callable to reward_fn—no need to fork the library.
Key highlights
- Single-agent mode via
gym.make('sumo-rl-v0', ...); multi-agent viaparallel_env()with PettingZoo’s parallel API - Ships with RESCO benchmark networks (grid4x4, etc.) and example scripts for stable-baselines3 DQN, RLlib PPO, and Q-learning
- Default observation encodes phase state, lane density, and queue lengths as normalized vectors
- Optional ~8x speedup via Libsumo (
LIBSUMO_AS_TRACI=1), though you lose the GUI and parallel simulations - Published enough that there’s a citation bibtex and a dozen+ papers using it
Caveats
- Requires separate SUMO installation (PPA on Ubuntu, or equivalent) plus
SUMO_HOMEenvironment variable - Libsumo fast path is mutually exclusive with
sumo-guiand multi-simulation parallelism - The README notes stable-baselines3 needs a pre-release (
>=2.0.0a9) for Gymnasium compatibility
Verdict
Worth a look if you’re doing RL research on urban traffic control and want standard APIs without writing TraCI boilerplate. Skip it if you need real-time production signal optimization—this is a research sandbox, not a city operations tool.