Skip to content

Project for Reinforcement Learning Course 2018 - MSc Artificial Intelligence @ UvA

License

Notifications You must be signed in to change notification settings

gabriele-bani/rl-demonstrations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Demonstrations (the Good, the Bad and the Worse)

License

Description

Poster and Code for the project in Reinforcement Learning course of the MSc in Artificial Intelligence at the University of Amsterdam. Joint project of Gabriele Bani, Andrii Skliar, Gabriele Cesa and Davide Belli

Main Idea

Using single human demonstration has been shown to outperform humans and beat state of the art models in hard exploration problems [Learning Montezuma's Revenge from a Single Demonstration].

However, it takes an experienced professional to provide good demonstration to the model, which might be impossible in real problems. It might also be difficult to obtain optimal demonstrations. Can we still learn optimal policies from sub-optimal demonstrations?

Approach

Basic idea: divide the trajectory in n splits. Train on the last one until convergence, then select the previous split. Repeat until the first split, so to learn from increasingly difficult exploration problems.

Results


Figure: Returns over episodes in Maze (left), MounainCar (middle) and LunarLander (right).

  • Non optimal demonstrations can lead to optimal results, but better demonstrations lead to better learning and give more reliable
  • In Maze, using bad demonstrations rather than suboptimal ones results in a better final policy because of a higher degree of exploration.
  • With more complex environments, we expect demonstrations to allow for a much faster training than training from scratch.
  • The current implementation is very sensitive to hyperparameter choices; there is a need for a more automatic and reliable version of the backward algorithm to overcome this issue.

Copyright

Copyright © 2018 Gabriele Bani.

This project is distributed under the MIT license. This was developed as part of the Reinforcement Learning course taught by Herke van Hoof at the University of Amsterdam. Please follow the UvA regulations governing Fraud and Plagiarism in case you are a student.

About

Project for Reinforcement Learning Course 2018 - MSc Artificial Intelligence @ UvA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •