Skip to content

Code to reproduce results on toy tasks and companion blog for the paper.

License

Notifications You must be signed in to change notification settings

vihangp/align-rudder

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Vihang P. Patil1, Markus Hofmarcher1, Markus-Constantin Dinu1, Matthias Dorfer3, Patrick M. Blies3, Johannes Brandstetter1, Jose A. Arjona-Medina1, Sepp Hochreiter1, 2

1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
2 Institute of Advanced Research in Artificial Intelligence (IARAI)
3 enliteAI, Vienna, Austria


Detailed blog post on this paper at this link and a video showcasing the MineCraft agent at this link.

The full paper is available at https://arxiv.org/abs/2009.14108

Implementation of Align-RUDDER

This package contains an implementation of Align-RUDDER together with code to reproduce the results of artificial tasks I & II as stated in the paper. For the sake of time the default settings include only 10 seeds per experiment instead of the 100 used for the results in the paper.

Dependencies

To reproduce all results we provide an environment.yml file to setup a conda environment with the required packages. Run the following command to create the environment:

conda env create --file environment.yml
conda activate align-rudder
pip install -e .

Usage

To recreate the results from the paper you can run the included run scripts for the FourRooms and EightRooms environments and the respective method.

Align-RUDDER

python align_rudder/run_four_alignrudder.py
python align_rudder/run_eight_alignrudder.py

Behavioral Cloning + Q-Learning

python align_rudder/run_four_bc.py
python align_rudder/run_eight_bc.py

DQFD (Deep Q-Learning from Demonstrations)

python align_rudder/run_four_dqfd.py
python align_rudder/run_eight_dqfd.py

RUDDER (LSTM)

python align_rudder/run_four_rudder_lstm.py
python align_rudder/run_eight_rudder_lstm.py

Results

Once you ran all experiments you are interested in you can run the following script to get a summary of the results. By default plots for all available environments will be generated.

python align_rudder/plot_results.py [--env "FourRooms"|"EightRooms"|"all"]

LICENSE

MIT LICENSE

About

Code to reproduce results on toy tasks and companion blog for the paper.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%