← all repositories
Khrylx/PyTorch-RL

PyTorch 0.4-era RL toolkit with a fast Fisher trick

A compact reference implementation of TRPO, PPO, A2C, and GAIL from the pre-PyTorch 1.0 era, with a notable optimization for Fisher vector products.

1.3k stars Python AgentsML Frameworks
PyTorch-RL
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This repo bundles four influential reinforcement learning algorithms into runnable PyTorch code: TRPO, PPO, A2C, and GAIL for imitation learning. It targets OpenAI Gym environments (including MuJoCo) and handles both discrete and continuous action spaces. The code is structured as straightforward scripts rather than a framework — you run examples/ppo_gym.py, not an import-heavy pipeline.

The interesting bit

The standout is the “fast Fisher vector product calculation” for TRPO. Rather than the naive approach, the implementation uses a clever trick to avoid materializing the full Fisher information matrix — a significant speedup for a notoriously expensive step. There’s even a linked blog post walking through the math, which is rarer than it should be in RL repos.

Key highlights

  • Policy gradient trio: TRPO, PPO, A2C with direct paper-to-code examples
  • GAIL support: save expert trajectories, then train an imitator via adversarial learning
  • Multiprocessing sample collection: claims ~8× speedup over single-threaded rollout
  • Discrete and continuous action spaces supported
  • Direct lineage from ikostrikov/pytorch-trpo and OpenAI Baselines

Caveats

  • Locked to PyTorch 0.4 (0.3 on a branch); this is pre-autograd-graph-rewrite PyTorch, so expect friction with modern versions
  • MuJoCo dependency means you’ll wrestle with mujoco-py licensing and installation
  • GPU users must manually set OMP_NUM_THREADS=1 to avoid PyTorch’s threading fighting with multiprocessing — the README notes this can make Linux multiprocessing “even slower than a single thread” if ignored

Verdict

Worth a look if you’re studying the classics and want readable, self-contained implementations — especially that Fisher vector product. Skip it if you need production-grade, maintained code; this is a 2018-era snapshot with the dependencies to match.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.