← all repositories
geek-ai/MAgent

Abandoned RL platform trained armies by the million

A 2018 research environment for many-agent reinforcement learning that scaled to millions of agents, now left for a community fork to maintain.

1.8k stars Python AgentsML Frameworks
MAgent
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does

MAgent is a gridworld-based research platform built to push reinforcement learning past the usual handful of agents. It runs pursuit, gathering, and battle scenarios where swarms of agents learn collective behavior — think hundreds to millions of entities in shared environments. The project shipped with TensorFlow and MXNet baselines (parameter-sharing DQN, DRQN, A2C) and an interactive “general” mode where you command trained soldiers against the AI.

The interesting bit

The scale ambition is the hook. Most RL platforms of the era topped out at a few agents; MAgent treated “many-agent” as the core research question, not an afterthought. The battle demo GIFs show emergent swarm tactics that look convincingly organic — the kind of thing that makes you briefly forget you’re watching gradients at work.

Key highlights

  • Targets “hundreds to millions” of simultaneous agents in shared gridworld environments
  • Three built-in scenarios: pursuit, resource gathering, and large-scale battle
  • Baseline implementations in both TensorFlow and MXNet; DQN reportedly performed best in their settings
  • Includes interactive play mode where a human commands agents against trained policies
  • Published as an AAAI 2018 demo paper with accompanying video

Caveats

  • Explicitly unmaintained: the README banner directs users to MAgent2 for continued development
  • Build process is manual and dated — requires compiling C++ dependencies (Boost, websocketpp, jsoncpp) and manually setting PYTHONPATH
  • macOS install involves a janky Homebrew tap workaround for websocketpp (issue #17)
  • Python 2.7 support and GTX 1080 Ti-era training times (~1 day) place it firmly in the archaeological record

Verdict

Worth a look if you’re researching historical multi-agent RL approaches or need the original paper’s exact environment for reproducibility. Everyone else should head straight to the Farama-maintained fork, which actually installs with pip.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.