Abandoned RL platform trained armies by the million
A 2018 research environment for many-agent reinforcement learning that scaled to millions of agents, now left for a community fork to maintain.

What it does
MAgent is a gridworld-based research platform built to push reinforcement learning past the usual handful of agents. It runs pursuit, gathering, and battle scenarios where swarms of agents learn collective behavior — think hundreds to millions of entities in shared environments. The project shipped with TensorFlow and MXNet baselines (parameter-sharing DQN, DRQN, A2C) and an interactive “general” mode where you command trained soldiers against the AI.
The interesting bit
The scale ambition is the hook. Most RL platforms of the era topped out at a few agents; MAgent treated “many-agent” as the core research question, not an afterthought. The battle demo GIFs show emergent swarm tactics that look convincingly organic — the kind of thing that makes you briefly forget you’re watching gradients at work.
Key highlights
- Targets “hundreds to millions” of simultaneous agents in shared gridworld environments
- Three built-in scenarios: pursuit, resource gathering, and large-scale battle
- Baseline implementations in both TensorFlow and MXNet; DQN reportedly performed best in their settings
- Includes interactive play mode where a human commands agents against trained policies
- Published as an AAAI 2018 demo paper with accompanying video
Caveats
- Explicitly unmaintained: the README banner directs users to MAgent2 for continued development
- Build process is manual and dated — requires compiling C++ dependencies (Boost, websocketpp, jsoncpp) and manually setting
PYTHONPATH - macOS install involves a janky Homebrew tap workaround for websocketpp (issue #17)
- Python 2.7 support and GTX 1080 Ti-era training times (~1 day) place it firmly in the archaeological record
Verdict
Worth a look if you’re researching historical multi-agent RL approaches or need the original paper’s exact environment for reproducibility. Everyone else should head straight to the Farama-maintained fork, which actually installs with pip.