Supersonic

Multiworker Deep RL Agent for OpenAI Gym and Gym Retro

Supersonic is an implementation of Proximal Policy Optimization with Random Network Distillation, written in eager-execution TensorFlow and with support for multiple workers on high cpu machines or clusters.

To train an agent on the Sonic level 'Green Hill Zone Act1':

mpiexec -n 32 python train.py --lvl GreenHillZone.Act1

Setup and Installation

Clone this repository with git clone -r https://github.com/jakegrigsby/supersonic.git
Install the package

    cd supersonic
    pip install -e .

Install other dependencies

    pip install -r requirements.txt

If you want to train on Sonic, you'll need to buy the ROMs and install them on your system. See the retro contest details for more instructions. After installing the ROMs, you can run the included ./set_up_correct_reward_funcs.sh to switch out the default reward function for the correct one used by the Retro Contest.

Adding New Environments

Supersonic can run in any OpenAI Gym or Gym-Retro environment (assuming you have the ROMs) out of the box. However, it's common practice to use 'wrappers' around the environment, which do things like clip the reward, reshape the observation or convert the frames to grayscale. Defaults are included for all of the v0 pixel-only atari 2600 environments, (84x84 grayscaling, normalizing, frameskipping and 'sticky' actions) as well as all of the Sonic levels. (see data/sonic-train.csv and data/sonic-val.csv for a list of those options). Support for consecutive Sonic levels is included (see environment.Gauntlet and environment.greenhillzonecomplete). Also comes with defaults for gym-super-mario-bros.

To add your own custom wrappers, write a function in environment.py that returns the wrapped environment, using any of the wrappers included in that file (or added by you). Then use the env_builder decorator with the key for that environment. This will be what you enter from the command line to train on that wrapped environment. Here's an example:

@env_builder('VeryCustomEnvironment-v100')
def build_myenv(lvl):
    env = base_env(lvl)
    env = WarpFrame(env)
    env = MaxAndSkipEnv(env, skip=4)
    env = RewardScaler(env)
    env = StickyActionEnv(env)
    env = FrameStackWrapper(env)
    return env

You should then be able to train on your environment by running mpiexec -n 4 python train.py --lvl VeryCustomEnvironment-v100

Training Agents

Training is launched from the command line using the command:

mpiexec -n *num of workers* python train.py --lvl *env name*
--logdir *path to write logs* --rollouts *num of rollouts*

Weights are saved in the weights directory under a folder with the same name as the --logdir you specify.

An additional flag, --render can be added if you want to watch training live. This is an int that determines how many of the parallel environments are rendered. So mpiexec -n 128 python train.py --render 1 trains with 128 workers but will only render 1 of them.

At this time, supersonic can only run multiple workers using the cpu version of TensorFlow. It uses synchronous gradient descent to distribute computation and increase performance.

Testing Agents

python test.py --lvl *env name* --weights *path to correct weights dir* --episodes *num of episodes*

An example would be python test.py --lvl GreenHillZone.Act1 --weights GreenHillZoneAct1/checkpoint_9500. The additional flags --record and --record_path are a bool and str that let gameplay footage be recorded and saved to the specified directory.

References

Papers:

Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017).

Burda, Yuri, et al. "Exploration by random network distillation." arXiv preprint arXiv:1810.12894 (2018).

Chen, Jianmin, et al. "Revisiting distributed synchronous SGD." arXiv preprint arXiv:1604.00981 (2016).

Repositories:

openai/random-network-distillation

jcwleo/random-network-distillation-pytorch

openai/spinningup

openai/baselines

Developed by students at the University of Virginia, 2019.

UVA Data Science Institute

UVA Advanced Research Computing Services

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
supersonic		supersonic
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
set_up_correct_reward_funcs.sh		set_up_correct_reward_funcs.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supersonic

Multiworker Deep RL Agent for OpenAI Gym and Gym Retro

Setup and Installation

Adding New Environments

Training Agents

Testing Agents

References

Papers:

Repositories:

About

Releases

Packages

Contributors 2

Languages

jakegrigsby/supersonic

Folders and files

Latest commit

History

Repository files navigation

Supersonic

Multiworker Deep RL Agent for OpenAI Gym and Gym Retro

Setup and Installation

Adding New Environments

Training Agents

Testing Agents

References

Papers:

Repositories:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages