← all repositories
miyosuda/async_deep_reinforce

A3C in 588 lines of patience: reproducing DeepMind's async RL

A straightforward TensorFlow implementation of A3C that trains Pong agents for 26 hours and actually shows its work.

588 stars Python AgentsML Frameworks
async_deep_reinforce
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Implements both A3C-FF and A3C-LSTM from DeepMind’s 2016 paper, specifically for Atari Pong. The repo includes training, display, and actual benchmark numbers comparing GTX 980Ti against a Core i7 6700 — rare honesty for a 2017-era RL project.

The interesting bit

The author patched the Arcade Learning Environment itself for multi-threading rather than wrapping around it, which is the kind of yak-shaving that tells you async RL was still rough terrain in 2017. Also notable: scores are deliberately not averaged using the global network, explicitly diverging from the paper.

Key highlights

  • Both feed-forward and LSTM variants implemented
  • GPU vs CPU speed comparison included (GPU wins, but not by the margin you might expect)
  • Requires a custom fork of ALE, not the standard pip install
  • TensorFlow r1.0 era — expect archaeology if you try to run it now
  • 26-hour training video provided as proof of life

Caveats

  • Hard-locked to TensorFlow r1.0; modern TF will likely break
  • Only validated on Pong, not the full Atari suite
  • Custom ALE build step is a genuine friction point

Verdict

Worth studying if you’re tracing the evolution of A3C implementations or need a minimal reference before building your own. Skip if you want something that runs out-of-the-box in 2024 — this is a period piece, not a framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.