Grid world

source("./grid_world4x4.R")

Figure 4.1: Convergence of iterative policy evaluation on a small gridworld

plot_fig4.1()

Printing optimal policy

compute_V_star(return_best_actions = T)$best_actions
##      [,1]  [,2]      [,3]      [,4] 
## [1,] "T"   "L"       "L"       "D/L"
## [2,] "U"   "U/L"     "U/D/L/R" "D"  
## [3,] "U"   "U/D/L/R" "D/R"     "D"  
## [4,] "U/R" "R"       "R"       "T"

Jack’s car rental

source("./jacks.R")

Figure 4.2: The sequence of policies found by policy iteration on Jack’s car rental problem, and the final state-value function

plot_fig4.2()

Gambler’s problem

source("./gambler.R")

Figure 4.3: The solution to the gambler’s problem

Value function

plot_fig4.3.1()

Final policy

plot_fig4.3.2()