← all repositories

ikostrikov/pytorch-trpo

A PyTorch implementation of Trust Region Policy Optimization, a deep reinforcement learning algorithm for continuous robotic control tasks.

448 stars Python AgentsML Frameworks
pytorch-trpo
Velocity · 7d
+0.1
★ / day
Trend
steady
star history

This repository provides a PyTorch implementation of TRPO (Trust Region Policy Optimization), a policy gradient method for training reinforcement learning agents in continuous control environments. The implementation uses exact Hessian-vector products for computing natural gradient updates, offering better precision than finite differences approximations. It is designed to work with Mujoco physics simulation environments for training robotic control policies, with configurable hyperparameters for tasks like Reacher, Hopper, Walker2d, and Humanoid.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.