Skip to content

simerplaha/reinforcement-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement learning

Referred material

Basic tic-tac-toe game. Uses basic probability matrix for each game state to make decisions. WIP - needs better prediction.

Bandit from chapter 2. Uses incremental implementation.

10 levers with probability of [0.5, 0.10, 0.20, 0.25, 0.30, 0.50, 0.60, 0.65, 0.80, 0.90] for each lever in that order.

direction

Last lever has the highest probability (0.90) therefore has more chance of getting pulled.

Implements the Student MDP from David Silver's lecture 2 at this (24:56) timestamp. There are tests in StudentSpec that prove that no other state can return the same optimal value as optimal state using bellman equation.

Value: -2.25      Sample: List(Class1, Class2, Class3, Pass, Sleep)
Value: -3.125     Sample: List(Class1, Facebook, Facebook, Class1, Class2, Sleep)
Value: -3.65625   Sample: List(Class1, Class2, Class3, Pub, Class2, Class3, Pass, Sleep)
Value: -2.21875   Sample: List(Facebook, Facebook, Facebook, Class1, Class2, Class3, Pub, Class2, Sleep)

Grid World

Implements bellman equation to find the quickest path to targets within a grid.

The following shows results of a 11x11 grid with 3 goal targets - ⌂ (circled green). The arrows indicate the optimal direction to take at each grid to reach the nearest target.

direction

Value function created after 100 value iteration.

values