← all repositories
ikostrikov/pytorch-a3c

The author recommends you skip this repo

A clean PyTorch A3C implementation whose own creator suggests using A2C or PPO instead.

1.3k stars Python ML FrameworksAgents
pytorch-a3c
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

Implements Asynchronous Advantage Actor-Critic (A3C), the 2016 DeepMind algorithm that trains reinforcement-learning agents across multiple parallel processes without a GPU. You get 16 workers bashing Pong simultaneously, sharing gradient statistics through a single optimizer.

The interesting bit

The author is refreshingly honest: A2C works better, ACKTR beats both, and PPO dominates continuous control. This repo exists mainly for historical completeness and paper reproduction — a rarity in a field where everyone pretends their fork is state-of-the-art.

Key highlights

  • Python 3 only; 16 processes converge Pong in ~15 minutes
  • Uses shared-statistics optimizer per the original paper (not the OpenAI starter agent approach)
  • Evaluation runs in a separate thread alongside training workers
  • Breakout takes “more than several hours” — the README doesn’t sugarcoat it
  • Author actively redirects users to his newer pytorch-a2c-ppo-acktr repo

Caveats

  • No GPU support mentioned; this is CPU-multiprocessing territory
  • Sparse documentation beyond basic usage and one benchmark curve
  • Breakout performance is vague (“more than several hours”)

Verdict

Grab this if you need a readable A3C baseline for a course, paper comparison, or masochism. Everyone else should follow the author’s own advice and head to his A2C/PPO/ACKTR repo instead.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.