Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Fork of https://github.com/ikostrikov/pytorch-trpo with modifications for the paper "The Mirage of Action-Dependent Baselines in Reinforcement Learning".

License

Notifications You must be signed in to change notification settings

brain-research/mirage-rl-trpo

 
 

Repository files navigation

PyTorch implementation of TRPO

Try my implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

This is a PyTorch implementation of "Trust Region Policy Optimization (TRPO)".

This is code mostly ported from original implementation by John Schulman. In contrast to another implementation of TRPO in PyTorch, this implementation uses exact Hessian-vector product instead of finite differences approximation.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

python main.py --env-name "Reacher-v1"

Recommended hyper parameters

InvertedPendulum-v1: 5000

Reacher-v1, InvertedDoublePendulum-v1: 15000

HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000

Ant-v1, Humanoid-v1: 50000

Results

More or less similar to the original code. Coming soon.

Todo

  • Plots.
  • Collect data in multiple threads.

About

Fork of https://github.com/ikostrikov/pytorch-trpo with modifications for the paper "The Mirage of Action-Dependent Baselines in Reinforcement Learning".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%