← all repositories
MorvanZhou/pytorch-A3C

A3C in 200 lines: when PyTorch's multiprocessing beats TensorFlow

A stripped-down reference implementation showing why PyTorch's multiprocessing compatibility made it the pragmatic choice for asynchronous RL in 2018.

656 stars Python ML FrameworksAgents
pytorch-A3C
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

Trains an A3C agent to balance a pole and swing up a pendulum using Python’s multiprocessing module, not threading. The repo contains two scripts—one for discrete actions (CartPole), one for continuous (Pendulum)—plus a shared Adam optimizer that synchronizes parameters across workers.

The interesting bit

The author explicitly built this because every other PyTorch A3C implementation was “too complicated to dig into.” The real insight is architectural: TensorFlow’s multiprocessing story was so broken that distributed TF was slower than threading on a single machine, while PyTorch’s torch.multiprocessing let the author skip the complexity entirely. It’s a case study in framework ergonomics masquerading as a toy RL example.

Key highlights

  • Under 200 lines total, with separate discrete and continuous action implementations
  • Uses actual multiprocessing (not threading), with a custom SharedAdam optimizer for parameter synchronization
  • Targets PyTorch ≥ 0.4.0—frozen in 2018, so expect API drift
  • Includes result plots for both environments
  • Author’s explicit goal: “simplest toy implementation you can find” for learning the algorithm

Caveats

  • PyTorch 0.4.0 is ancient; you’ll need to modernize tensor operations and possibly multiprocessing calls
  • Only tested on two classic control environments—don’t expect it to scale to Atari or MuJoCo without substantial work
  • The “distributed TensorFlow” comparison is a single anecdote, not a benchmark

Verdict

Grab this if you’re teaching A3C or need to explain why PyTorch won the early RL ergonomics war. Skip it if you want production-ready distributed training or modern PyTorch patterns.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.