← all repositories
jingweiz/pytorch-rl

Reinforcement learning's greatest hits, wired to PyTorch and a live dashboard

A 2017-era training ground for DQN, A3C, and ACER that treats Visdom like a gym buddy who never skips leg day.

803 stars Python ML FrameworksAgents
pytorch-rl
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo implements five classic deep-RL algorithms—DQN, Double DQN, Dueling DQN, A3C (discrete and continuous), and ACER—on top of PyTorch and OpenAI Gym. You pick your agent, environment, and model in a single config file (utils/options.py), hit python main.py, and watch live training curves stream into a Visdom browser tab.

The interesting bit

The whole thing is held together by a factory pattern (utils/factory.py) that keeps main.py completely untouched no matter what you swap in. The authors also enforce a strict variable-naming convention (*_vb for Variables, *_ts for Tensors) so you always know what PyTorch type you’re holding. It’s the kind of obsessive tidiness that makes the codebase feel like a well-organized toolbox rather than a research scrapbook.

Key highlights

  • Live Visdom plotting and per-step logging out of the box; no manual instrumentation required.
  • Discrete and continuous action spaces supported for A3C, with MuJoCo environments optional.
  • Bonus shell scripts (plot.sh, plot_compare.sh) for post-hoc log analysis with color-coded comparisons.
  • Explicitly mirrors the code structure of the authors’ pytorch-dnc repo for easy cross-pollination.
  • Cites and builds on reference implementations from keras-rl, pytorch-dqn, and pytorch-a3c.

Caveats

  • Locked to Python 2.7 and PyTorch ≥0.2.0—this is a 2017 codebase, so expect archaeology if you’re on modern stacks.
  • ACER is marked “work in progress” with only discrete action support and truncated importance sampling; DDPG and NAF are still on the future-plans shelf.
  • Continuous A3C needs a separate MuJoCo install, which is its own circle of dependency hell.

Verdict

Worth a look if you’re teaching or reproducing foundational deep-RL papers and want a clean, uniform harness. Skip it if you need production-grade async training or modern PyTorch features—this is a museum piece with working demos, not a framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.