docker pull markub3327/rl-toolkit:2.0.2
# Training container (learner)
docker run -it --rm markub3327/rl-toolkit:2.0.2 python3 training.py [-h] -alg sac -env ENV_NAME -s PATH_TO_MODEL_FOLDER [--wandb]
# Simulation container (agent)
docker run -it --rm markub3327/rl-toolkit:2.0.2 python3 testing.py [-h] -alg sac -env ENV_NAME -f PATH_TO_MODEL_FOLDER [--wandb]
Environment | Observation space | Observation bounds | Action space | Action bounds |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf , inf] | (4, ) | [-1.0 , 1.0] |
Walker2DBulletEnv-v0 | (22, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
AntBulletEnv-v0 | (28, ) | [-inf , inf] | (8, ) | [-1.0 , 1.0] |
HalfCheetahBulletEnv-v0 | (26, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
HopperBulletEnv-v0 | (15, ) | [-inf , inf] | (3, ) | [-1.0 , 1.0] |
HumanoidBulletEnv-v0 | (44, ) | [-inf , inf] | (17, ) | [-1.0 , 1.0] |
Summary
Return from game
Environment | gSDE | gSDE + Huber loss |
---|---|---|
BipedalWalkerHardcore-v3(2) | 13 ± 18 | - |
Walker2DBulletEnv-v0(1) | 2270 ± 28 | 2732 ± 96 |
AntBulletEnv-v0(1) | 3106 ± 61 | 3460 ± 119 |
HalfCheetahBulletEnv-v0(1) | 2945 ± 95 | 3003 ± 226 |
HopperBulletEnv-v0(1) | 2515 ± 50 | 2555 ± 405 |
HumanoidBulletEnv-v0 | - | - |
Framework: Tensorflow 2.4.1
Languages: Python 3.8.5, Shell
Author: Martin Kubovčík