DeepMind's async RL paper, reassembled from spare Keras parts
A readable, low-RAM implementation of A3C's predecessor that runs on a 4 GB MacBook.

What it does Implements asynchronous 1-step Q-learning from DeepMind’s 2016 paper using Keras for the network, TensorFlow for optimization, and OpenAI Gym for Atari environments. Multiple actor-learner threads replace the usual experience replay buffer, which keeps memory usage low enough for modest hardware.
The interesting bit The author built this to learn TensorFlow, not to win benchmarks — and openly admits it. That honesty is refreshing: he notes the original paper averaged “the best 5 models from 50 experiments,” a detail he initially missed and which explains why single runs can look like failures. It’s a practical warning dressed as a README footnote.
Key highlights
- Runs on a MacBook with 4 GB RAM by skipping experience replay entirely
- Keras model definition is cleanly separated in
model.py - Includes TensorBoard logging for episode rewards and max Q values
- Evaluation mode produces Gym-compatible uploads
- Partial A3C implementation exists in
a3c.pyas a next-step stub
Caveats
- The author warns of high variance run-to-run; you may need multiple seeds
- Built against TensorFlow r0.9 and old Gym APIs, so expect bit-rot
a3c.pyis explicitly marked work-in-progress
Verdict Worth a look if you’re teaching yourself async RL and want readable, commented code before diving into production frameworks. Skip it if you need battle-tested implementations or modern PyTorch equivalents.