OpenAI's archived multi-agent RL code still sees use, with caveats
Reference implementation of MADDPG, a centralized-training-decentralized-execution algorithm for mixed cooperative-competitive environments.

What it does
This is OpenAI’s official code for MADDPG (Multi-Agent Deep Deterministic Policy Gradient), an actor-critic reinforcement learning method where multiple agents learn simultaneously in environments that can be cooperative, competitive, or both. It is built specifically to pair with the Multi-Agent Particle Environments (MPE), a set of simple 2D physics-based scenarios for testing multi-agent behavior.
The interesting bit
The README is unusually honest: the codebase was restructured after publication, and results may differ from the original 2017 NIPS paper. The original policy ensemble and estimation code lives in a Dropbox zip, not the repo — a small archaeological dig if you need exact reproducibility.
Key highlights
- Centralized training with decentralized execution: each agent’s critic sees all observations and actions, while each actor sees only its own
- Supports mixing MADDPG and vanilla DDPG agents in the same environment (e.g., “good” agents vs. adversaries)
- Command-line interface covers training, checkpointing, evaluation, and benchmarking without extra scaffolding
- Core algorithm is ~4 files:
maddpg.py,replay_buffer.py, plus TensorFlow utilities - 1,976 stars suggests it remains a common baseline, despite archive status
Caveats
- Frozen in time: Python 3.5.4, TensorFlow 1.8.0, OpenAI Gym 0.10.5 — a dependency stack that will fight modern environments
- No maintenance: Explicitly archived; “no updates expected”
- Reproducibility gap: Restructured code + missing original ensemble implementation = paper numbers may not materialize
Verdict
Worth a look if you need a readable, cited baseline for multi-agent RL research or are comparing against MADDPG specifically. Skip it if you want production-ready code or a plug-and-play modern framework — this is a 2017 time capsule with sharp edges.