A3C in 588 lines of patience: reproducing DeepMind's async RL
A straightforward TensorFlow implementation of A3C that trains Pong agents for 26 hours and actually shows its work.

What it does
Implements both A3C-FF and A3C-LSTM from DeepMind’s 2016 paper, specifically for Atari Pong. The repo includes training, display, and actual benchmark numbers comparing GTX 980Ti against a Core i7 6700 — rare honesty for a 2017-era RL project.
The interesting bit
The author patched the Arcade Learning Environment itself for multi-threading rather than wrapping around it, which is the kind of yak-shaving that tells you async RL was still rough terrain in 2017. Also notable: scores are deliberately not averaged using the global network, explicitly diverging from the paper.
Key highlights
- Both feed-forward and LSTM variants implemented
- GPU vs CPU speed comparison included (GPU wins, but not by the margin you might expect)
- Requires a custom fork of ALE, not the standard pip install
- TensorFlow r1.0 era — expect archaeology if you try to run it now
- 26-hour training video provided as proof of life
Caveats
- Hard-locked to TensorFlow r1.0; modern TF will likely break
- Only validated on Pong, not the full Atari suite
- Custom ALE build step is a genuine friction point
Verdict
Worth studying if you’re tracing the evolution of A3C implementations or need a minimal reference before building your own. Skip if you want something that runs out-of-the-box in 2024 — this is a period piece, not a framework.