Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Obstacle Tower Challenge


To create an autonomous AI agent that can play the Obstacle Tower Challenge game and climb to the highest level possible.

Setup Instructions

  1. Install python 3.8.0 in your machine using pyenv
  2. Fork the repository from here.
  3. Clone the repositoy from your Github profile
git clone<YOUR_USERNAME>/obstacle-tower-challenge.git
  1. Run the following commands:
cd obstacle-tower-challenge/

# Set python version for the local folder
pyenv local 3.8.0

# Install pyenv-virtualenv
git clone ~/.pyenv/plugins/pyenv-virtualenv
source ~/.bashrc
mkdir venv
cd venv/
pyenv virtualenv 3.8.0 venv
cd ..

# activate virtual environment
pyenv activate venv

# confirm python version
python -V

# Install dependencies
python3 -m pip install --upgrade pip
pip install -r requirements.txt
  1. Setup jupyter to work with the virtual environment
  2. By default, the binary will be automatically downloaded when the Obstacle Tower gym is first instantiated. The following line in the Jupyter notebook instantiates the environment:
env = ObstacleTowerEnv(retro=False, realtime_mode=False)
  1. The binaries for each platform can be separately downloaded at the following links. Using these binaries you can play the game.
Platform Download Link
Linux (x86_64)
Mac OS X

Quick Setup - Docker

You can use Docker to perform a quick setup on a virtual machine. The base image is Docker's Ubuntu Image. The following libraries and packages are installed on the machine as part of Docker quickstart:

  • GCC compiler toolset
  • Python 3.8 and PIP
  • Git
  • All other dependencies for this game here
Note: The image is successfully built, but faces trouble with display drivers when we attempt to train the agent. We will continue to work on this item in the future.

Game details

The environment provided has a MultiDiscrete action space (list of valid actions), where the 4 dimensions are: MultiDiscrete([3 3 2 3]) 0. Movement (No-Op/Forward/Back)

  1. Camera Rotation (No-Op/Counter-Clockwiseorward/Ba/Clockwise)
  2. Jump (No-Op/Jump)
  3. Movement (No-Op/Right/Left)

The observation space provided includes a 168x168 image (the camera from the simulation) as well as the number of keys held by the agent (0-5) and the amount of time remaining.

Models and their usage

  1. Random Agent
usage: random [-h] [--max-eps MAX_EPS] [--save-dir SAVE_DIR]

optional arguments:
  -h, --help           show this help message and exit
  --max-eps MAX_EPS    Maximum number of episodes (games) to run.
  --save-dir SAVE_DIR  Directory in which you desire to save the model.
  1. A3C Agent
usage: a3c [-h] [--lr LR] [--max-eps MAX_EPS] [--update-freq UPDATE_FREQ] [--gamma GAMMA] [--num-workers NUM_WORKERS] [--save-dir SAVE_DIR]

optional arguments:
  -h, --help            show this help message and exit
  --lr LR               Learning rate for the shared optimizer.
  --max-eps MAX_EPS     Maximum number of episodes (games) to run.
  --update-freq UPDATE_FREQ
                        How often to update the global model.
  --gamma GAMMA         Discount factor of rewards.
  --num-workers NUM_WORKERS
                        Number of workers for asynchronous learning.
  --save-dir SAVE_DIR   Directory in which you desire to save the model.
  1. PPO Agent
usage: ppo [-h] [--lr LR] [--max-eps MAX_EPS]
                    [--update-freq UPDATE_FREQ] [--timesteps TIMESTEPS]
                    [--batch-size BATCH_SIZE] [--gamma GAMMA]
                    [--num-workers NUM_WORKERS] [--save-dir SAVE_DIR]
                    [--plot PLOT]

optional arguments:
  -h, --help            show this help message and exit
  --lr LR               Learning rate for the shared optimizer.
  --max-eps MAX_EPS     Maximum number of episodes (games) to run.
  --update-freq UPDATE_FREQ
                        How often to update the global model.
  --timesteps TIMESTEPS
                        Maximum number of episodes (games) to run.
  --batch-size BATCH_SIZE
                        How often to update the global model.
  --gamma GAMMA         Discount factor of rewards.
  --num-workers NUM_WORKERS
                        Number of workers for asynchronous learning.
  --save-dir SAVE_DIR   Directory in which you desire to save the model.
  --plot PLOT           Plot model results (rewards, loss, etc)
  1. Curiosity Agent
usage: curiosity [-h] [--lr LR] [--timesteps TIMESTEPS] [--batch-size BATCH_SIZE] [--gamma GAMMA] [--save-dir SAVE_DIR]

optional arguments:
  -h, --help            show this help message and exit
  --lr LR               Learning rate for the shared optimizer.
  --timesteps TIMESTEPS
                        Maximum number of episodes (games) to run.
  --batch-size BATCH_SIZE
                        How often to update the global model.
  --gamma GAMMA         Discount factor of rewards.
  --save-dir SAVE_DIR   Directory in which you desire to save the model.
  1. Stable A2C Agent
usage: stable_a2c [-h] [--timesteps TIMESTEPS] [--policy-name POLICY_NAME] [--save-dir SAVE_DIR] [--continue-training]

optional arguments:
  -h, --help            show this help message and exit
  --timesteps TIMESTEPS
                        Number of timesteps to train the PPO agent for.
  --policy-name POLICY_NAME
                        Policy to train for the PPO agent.
  --save-dir SAVE_DIR   Directory in which you desire to save the model.
  --continue-training   Continue training the previously trained model.
  1. Stable PPO Agent
usage: stable_ppo [-h] [--timesteps TIMESTEPS] [--policy-name POLICY_NAME] [--save-dir SAVE_DIR] [--continue-training] [--reduced-action]

optional arguments:
  -h, --help            show this help message and exit
  --timesteps TIMESTEPS
                        Number of timesteps to train the PPO agent for.
  --policy-name POLICY_NAME
                        Policy to train for the PPO agent.
  --save-dir SAVE_DIR   Directory in which you desire to save the model.
  --continue-training   Continue training the previously trained model.
  --reduced-action      Use a reduced set of actions for training

Distributed Tensorflow

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

We have used tf.distribute.MirroredStrategy to explore distributed tensorflow library, and noticed that we can only leverage the utility of this library if we have access to a farm of GPU clusters. Our future work will focus on cloud training, along with experimentation of the following strategies:


To train the agent:

python src/ --env <PATH_TO_OTC_GAME> <AGENT_NAME> [<ARGS>]

View training logs on Tensorboard:

# to view graphs in tensorboard
tensorboard --logdir logs/

To play a game with a trained agent:

# play an episode of the game using a given policy (random or a3c)
python --env <PATH_TO_OTC_GAME> --algorithm random

# evaluate a given agent
python --env <PATH_TO_OTC_GAME> --algorithm random --evaluate


CSCI 527: ML For Games Project







No releases published


No packages published

Contributors 4
