In this assignment, a modified version of deep Q-learning from DeepMind’s paper is implemented. For the environment setting, the player controls a bar that can move horizontally, and gets rewards by bouncing a ball into bricks, breaking them. We are going to use MinAtar ([7]), a miniaturized version of the original Atari game. Instead of the original 210 × 160 RGB image resolution, MinAtar uses a 10 × 10 boolean grid, which makes it possible to use a significantly smaller model and still get a good performance.
The
The
- One convolution layer with 16 output channels, a kernel size of 3, stride 1, and no padding.
- A ReLU activation.
- A dense layer with 128 hidden units.
- Another ReLU activation.
- The final output layer. The code parts are q4_nature_torch.py and q5_nature_torch.py.
The result of linear approaximation, code: q6_train_atari_linear.py
The result of neural network approaximation, code: q6_train_atari_nature.py
As the result shows, the value of the neural network approaximation is higher than the value of the linear approaximation, but the std of neural network approaximation is also higher than the std of the linear approaximation, showing some unstable properties of neural network approaximation.
- Test different hyperparameters for the training
- Implement different model structure for neural network approaximation