← all repositories
inoryy/reaver

A dead RL framework that actually beat DeepMind at StarCraft minigames

Reaver squeezed 1.5x sampling speed from single-machine setups by ditching MPI for lock-free shared memory, then the author walked away.

561 stars Python AgentsDomain Apps
reaver
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Reaver trains deep reinforcement learning agents on StarCraft II minigames, Atari, MuJoCo, and OpenAI Gym. It ships A2C and PPO implementations with GAE, reward clipping, and gradient norm clipping, all wired through gin-config. The pitch: four lines of Python or a one-liner CLI to spin up parallel environments.

The interesting bit

The author skipped the usual multiprocessing pipe/MPI dance and built lock-free shared-memory parallelism instead. For single-machine setups—the kind actual researchers have—this reportedly hits 1.5x StarCraft II sampling speed versus message-passing, and up to 100x in “general case” (the README’s phrasing, not mine). The results table shows Reaver’s A2C matching or beating DeepMind’s SC2LE baseline on MoveToBeacon and CollectMineralShards, though it lags on harder maps.

Key highlights

  • Lock-free shared-memory parallel env sampling, bottlenecked “almost exclusively by GPU I/O”
  • Modular env/agent/model split; drop-in replacements for Gym, Atari, MuJoCo
  • Pre-tuned hyperparameters per environment via gin-config files
  • Validated against reference PPO paper results; companion Google Colab notebook available
  • PySC2 ≥3.0, TensorFlow ≥2.0, TensorFlow Probability ≥0.9

Caveats

  • No longer maintained—author explicitly states they “no longer able to further develop or provide support”
  • Windows setup exists but author nudges toward Linux for “performance and stability”
  • Harder SC2 maps (DefeatRoaches, BuildMarines) show gaps versus DeepMind’s later ReDRL work or human expert scores

Verdict

Worth a look if you’re studying single-machine RL infrastructure or need a clean, modular TF2 codebase to fork. Skip it if you want active maintenance, PyTorch, or distributed training at DeepMind scale.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.