PyTorch 0.4-era RL toolkit with a fast Fisher trick
A compact reference implementation of TRPO, PPO, A2C, and GAIL from the pre-PyTorch 1.0 era, with a notable optimization for Fisher vector products.

What it does
This repo bundles four influential reinforcement learning algorithms into runnable PyTorch code: TRPO, PPO, A2C, and GAIL for imitation learning. It targets OpenAI Gym environments (including MuJoCo) and handles both discrete and continuous action spaces. The code is structured as straightforward scripts rather than a framework — you run examples/ppo_gym.py, not an import-heavy pipeline.
The interesting bit
The standout is the “fast Fisher vector product calculation” for TRPO. Rather than the naive approach, the implementation uses a clever trick to avoid materializing the full Fisher information matrix — a significant speedup for a notoriously expensive step. There’s even a linked blog post walking through the math, which is rarer than it should be in RL repos.
Key highlights
- Policy gradient trio: TRPO, PPO, A2C with direct paper-to-code examples
- GAIL support: save expert trajectories, then train an imitator via adversarial learning
- Multiprocessing sample collection: claims ~8× speedup over single-threaded rollout
- Discrete and continuous action spaces supported
- Direct lineage from
ikostrikov/pytorch-trpoand OpenAI Baselines
Caveats
- Locked to PyTorch 0.4 (0.3 on a branch); this is pre-autograd-graph-rewrite PyTorch, so expect friction with modern versions
- MuJoCo dependency means you’ll wrestle with
mujoco-pylicensing and installation - GPU users must manually set
OMP_NUM_THREADS=1to avoid PyTorch’s threading fighting with multiprocessing — the README notes this can make Linux multiprocessing “even slower than a single thread” if ignored
Verdict
Worth a look if you’re studying the classics and want readable, self-contained implementations — especially that Fisher vector product. Skip it if you need production-grade, maintained code; this is a 2018-era snapshot with the dependencies to match.