Skip to content

Exploring Imitation Learning (DAGGER), RL (Policy Gradients and Soft Actor-Critic) and Imitation-Seeded RL for training MuJoCo Environments in OpenAI's Gym

Notifications You must be signed in to change notification settings

Panjete/mujocoagents

Repository files navigation

Dimensions

Hopper : input 11, out 3 Half Cheetah : input 17, out 6 Ant : input 27, out 8

Imitation learning

Hopper v4

  • ntrajs : 50, maxlentrajs: 100 , iters : 50, lossfunction:MSE , NN: linear, beta: 0.5 reward : 742

  • ntrajs : 50, maxlentrajs: 100 , iters : 50, lossfunction:MSE , NN: linear, beta: 1/(1+ sqrt(timesteps/1000)) reward : 720ish

  • training iterations need to be more : steady increase phase

  • training iters make reward peak at around 75. Then, rewards fall down - did go as high as 1500 when 100 timestamps

  • ntrajs : 50, maxlentrajs: 100 , iters : 75, lossfunction:MSE , NN: linear, beta: 1/(1+ timesteps/1000) reward : 2200ish

  • ntrajs : 50, maxlentrajs: 400 , iters : 75, lossfunction:MSE , NN: linear, beta: 1/(1+ timesteps/1000) reward : 2400ish - more stable near 75

Half cheetah v4

  • ntrajs : 50, maxlentrajs: 400 , iters : 75, lossfunction:MSE , NN: linear, beta: 1/(1+ timesteps/1000) reward : 2400ish - had achieved this fairly early though?

  • ntrajs : 50, maxlentrajs: 400 , iters : 75, lossfunction:MSE , NN: linear, beta: 1/(1+ timesteps/1000), optimiser : ADAM reward : 2400ish - had achieved this fairly early though?

About

Exploring Imitation Learning (DAGGER), RL (Policy Gradients and Soft Actor-Critic) and Imitation-Seeded RL for training MuJoCo Environments in OpenAI's Gym

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published