← all repositories
google-deepmind/scalable_agent

IMPALA: DeepMind's distributed RL architecture, open-sourced

A reference implementation of the actor-learner framework that decouples acting from learning to scale reinforcement learning across hundreds of parallel environments.

1k stars Python ML FrameworksAgents
scalable_agent
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This is DeepMind’s TensorFlow implementation of IMPALA (Importance Weighted Actor-Learner Architectures), a distributed deep reinforcement learning system. Actors run environments and generate experience, while a separate learner processes that experience in batches—no GPU needed on the actor side. The repo includes a dynamic batching module and targets DeepMind Lab, though the paper reports results on Atari and other domains too.

The interesting bit

The trick is the importance weighting: because actors and learners run asynchronously, the policy generating experience lags behind the policy being updated. IMPALA corrects for this staleness with V-trace, an off-policy correction method, letting you scale to hundreds of actors without waiting for the learner to catch up.

Key highlights

  • Single-machine mode: 48 actors, ~200-250 average episode return after 1B frames on a small DeepMind Lab level
  • Distributed mode: 150 actors across DMLab-30, achieving 45-50 capped human normalized score (training), ~2% lower at test time
  • Ships with a Dockerfile because the dependency stack—TensorFlow >=1.9.0-dev20180530, DeepMind Lab, Sonnet—is finicky and dated
  • The core experiment.py script switches between learner, actor, and test modes via command-line flags

Caveats

  • TensorFlow 1.x-era code; the specific dev build requirement suggests this is research archaeology at this point
  • README is sparse on the dynamic batching module—mentions it exists, doesn’t explain how to use or modify it
  • No officially supported Google product, so expect maintenance matching a 2018 ICML paper

Verdict

Worth studying if you’re implementing distributed RL from scratch or need a V-trace reference. Skip it if you want something that runs on modern PyTorch/JAX stacks without archaeology; the ideas migrated to newer frameworks long ago.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.