Referred material
- Book by Sutton & Barto - Reinforcement Learning: An Introduction
- Lectures by David Silver - Introduction to reinforcement learning
Basic tic-tac-toe game. Uses basic probability matrix for each game state to make decisions. WIP - needs better prediction.
Bandit from chapter 2. Uses incremental implementation.
10 levers with probability of [0.5, 0.10, 0.20, 0.25, 0.30, 0.50, 0.60, 0.65, 0.80, 0.90]
for each lever in that order.
Last lever has the highest probability (0.90
) therefore has more chance of getting pulled.
Implements the Student MDP
from David Silver's lecture 2 at this (24:56) timestamp.
There are tests in StudentSpec that prove that no other state can
return the same optimal value as optimal state using bellman's equation.
Value: -2.25 Sample: List(Class1, Class2, Class3, Pass, Sleep)
Value: -3.125 Sample: List(Class1, Facebook, Facebook, Class1, Class2, Sleep)
Value: -3.65625 Sample: List(Class1, Class2, Class3, Pub, Class2, Class3, Pass, Sleep)
Value: -2.21875 Sample: List(Facebook, Facebook, Facebook, Class1, Class2, Class3, Pub, Class2, Sleep)
Implements bellman's equation to find the quickest path to targets within a grid.
The following shows results of a 11x11 grid with 3 goal targets - ⌂ (circled green). The arrows indicate the optimal direction to take at each grid to reach the nearest target.
Value function created after 100 value iteration.