ikostrikov/pytorch-trpo
A PyTorch implementation of Trust Region Policy Optimization, a deep reinforcement learning algorithm for continuous robotic control tasks.

This repository provides a PyTorch implementation of TRPO (Trust Region Policy Optimization), a policy gradient method for training reinforcement learning agents in continuous control environments. The implementation uses exact Hessian-vector products for computing natural gradient updates, offering better precision than finite differences approximations. It is designed to work with Mujoco physics simulation environments for training robotic control policies, with configurable hyperparameters for tasks like Reacher, Hopper, Walker2d, and Humanoid.