Reinforcement learning's greatest hits, wired to PyTorch and a live dashboard
A 2017-era training ground for DQN, A3C, and ACER that treats Visdom like a gym buddy who never skips leg day.

What it does
This repo implements five classic deep-RL algorithms—DQN, Double DQN, Dueling DQN, A3C (discrete and continuous), and ACER—on top of PyTorch and OpenAI Gym. You pick your agent, environment, and model in a single config file (utils/options.py), hit python main.py, and watch live training curves stream into a Visdom browser tab.
The interesting bit
The whole thing is held together by a factory pattern (utils/factory.py) that keeps main.py completely untouched no matter what you swap in. The authors also enforce a strict variable-naming convention (*_vb for Variables, *_ts for Tensors) so you always know what PyTorch type you’re holding. It’s the kind of obsessive tidiness that makes the codebase feel like a well-organized toolbox rather than a research scrapbook.
Key highlights
- Live Visdom plotting and per-step logging out of the box; no manual instrumentation required.
- Discrete and continuous action spaces supported for A3C, with MuJoCo environments optional.
- Bonus shell scripts (
plot.sh,plot_compare.sh) for post-hoc log analysis with color-coded comparisons. - Explicitly mirrors the code structure of the authors’
pytorch-dncrepo for easy cross-pollination. - Cites and builds on reference implementations from
keras-rl,pytorch-dqn, andpytorch-a3c.
Caveats
- Locked to Python 2.7 and PyTorch ≥0.2.0—this is a 2017 codebase, so expect archaeology if you’re on modern stacks.
- ACER is marked “work in progress” with only discrete action support and truncated importance sampling; DDPG and NAF are still on the future-plans shelf.
- Continuous A3C needs a separate MuJoCo install, which is its own circle of dependency hell.
Verdict
Worth a look if you’re teaching or reproducing foundational deep-RL papers and want a clean, uniform harness. Skip it if you need production-grade async training or modern PyTorch features—this is a museum piece with working demos, not a framework.