IMPALA: DeepMind's distributed RL architecture, open-sourced
A reference implementation of the actor-learner framework that decouples acting from learning to scale reinforcement learning across hundreds of parallel environments.

What it does
This is DeepMind’s TensorFlow implementation of IMPALA (Importance Weighted Actor-Learner Architectures), a distributed deep reinforcement learning system. Actors run environments and generate experience, while a separate learner processes that experience in batches—no GPU needed on the actor side. The repo includes a dynamic batching module and targets DeepMind Lab, though the paper reports results on Atari and other domains too.
The interesting bit
The trick is the importance weighting: because actors and learners run asynchronously, the policy generating experience lags behind the policy being updated. IMPALA corrects for this staleness with V-trace, an off-policy correction method, letting you scale to hundreds of actors without waiting for the learner to catch up.
Key highlights
- Single-machine mode: 48 actors, ~200-250 average episode return after 1B frames on a small DeepMind Lab level
- Distributed mode: 150 actors across DMLab-30, achieving 45-50 capped human normalized score (training), ~2% lower at test time
- Ships with a Dockerfile because the dependency stack—TensorFlow >=1.9.0-dev20180530, DeepMind Lab, Sonnet—is finicky and dated
- The core
experiment.pyscript switches between learner, actor, and test modes via command-line flags
Caveats
- TensorFlow 1.x-era code; the specific dev build requirement suggests this is research archaeology at this point
- README is sparse on the dynamic batching module—mentions it exists, doesn’t explain how to use or modify it
- No officially supported Google product, so expect maintenance matching a 2018 ICML paper
Verdict
Worth studying if you’re implementing distributed RL from scratch or need a V-trace reference. Skip it if you want something that runs on modern PyTorch/JAX stacks without archaeology; the ideas migrated to newer frameworks long ago.