← all repositories
ChenglongChen/pytorch-DRL

A PyTorch zoo for RL algorithms that mostly stays in the CartPole gym

Modular implementations of A2C through PPO, with multi-agent ambitions but single-agent receipts so far.

617 stars Python AgentsML Frameworks
pytorch-DRL
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does This repo collects PyTorch rewrites of standard deep reinforcement learning algorithms—A2C, ACKTR, DQN, DDPG, PPO—behind a shared agent interface. Each algorithm gets the same six methods (interact, train, explore, act, value, evaluate), so swapping one for another is mostly a constructor change. The author also flags multi-agent as a target, though the current code and results are strictly single-agent.

The interesting bit The modular design is the actual contribution here. Most RL repos are algorithm-specific forks; this one tries to factor out the common scaffolding—experience collection, n-step returns, action noise—so the algorithms share legs. Whether that abstraction holds up under harder environments than CartPole is left as an exercise.

Key highlights

  • Unified agent API across all five algorithms
  • Supports both 1-step and n-step experience collection
  • Includes KFAC optimizer borrowed from Kostrikov’s reference implementation
  • All demonstrated results are on classic control tasks (CartPole, Pendulum)
  • MIT licensed, minimal dependencies (PyTorch, gym, Python 3.6)

Caveats

  • Several checklist items in the README remain unchecked, including the multi-agent code and TRPO/LOLA additions
  • The author openly warns that RL reproduction is fragile; your seeds and hyperparameters will diverge
  • No benchmark comparisons against reference implementations or baselines

Verdict Useful if you’re teaching RL or need a clean, hackable PyTorch skeleton to compare algorithm variants side-by-side. Skip it if you want battle-tested, production-grade implementations or anything beyond toy environments.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.