Pole-balancing example

source("./pole.R")

Calling main() and printing the result

main()
## Trial  1  was  8  steps.
## Trial  2  was  11  steps.
## Trial  3  was  25  steps.
## Trial  4  was  9  steps.
## Trial  5  was  13  steps.
## Trial  6  was  74  steps.
## Trial  7  was  40  steps.
## Trial  8  was  41  steps.
## Trial  9  was  19  steps.
## Trial  10  was  14  steps.
## Trial  11  was  9  steps.
## Trial  12  was  25  steps.
## Trial  13  was  53  steps.
## Trial  14  was  13  steps.
## Trial  15  was  9  steps.
## Trial  16  was  1083  steps.
## Trial  17  was  108  steps.
## Trial  18  was  74  steps.
## Trial  19  was  2001  steps.
## Trial  20  was  2764  steps.
## Trial  21  was  2677  steps.
## Trial  22  was  2878  steps.
## Trial  23  was  2861  steps.
## Trial  24  was  29  steps.
## Trial  25  was  2917  steps.
## Trial  26  was  2877  steps.
## Trial  27  was  595  steps.
## Trial  28  was  2876  steps.
## Trial  29  was  950  steps.
## Trial  30  was  919  steps.
## Trial  31  was  887  steps.
## Trial  32  was  2942  steps.
## Trial  33  was  2984  steps.
## Trial  34  was  2972  steps.
## Trial  35  was  2958  steps.
## Trial  36  was  2927  steps.
## Trial  37  was  3099  steps.
## Trial  38  was  2761  steps.
## Trial  39  was  2747  steps.
## Trial  40  was  2776  steps.
## Trial  41  was  2777  steps.
## Trial  42  was  2744  steps.
## Trial  43  was  2775  steps.
## Trial  44  was  2750  steps.
## Trial  45  was  2822  steps.
## Trial  46  was  2888  steps.
## Trial  47  was  2805  steps.
## Trial  48  was  2752  steps.
## Trial  49  was  306  steps.
## Trial  50  was  486  steps.
## Trial  51  was  148  steps.
## Trial  52  was  345  steps.
## Trial  53  was  631  steps.
## Trial  54  was  2786  steps.
## Trial  55  was  2950  steps.
## Trial  56  was  3152  steps.
## Trial  57  was  2895  steps.
## Trial  58  was  2760  steps.
## Trial  59  was  2923  steps.
## Trial  60  was  2994  steps.
## Trial  61  was  3077  steps.
## Trial  62  was  3247  steps.
## Trial  63  was  3178  steps.
## Trial  64  was  2111  steps.
## Trial  65  was  8863  steps.
## Trial  66  was  3529  steps.
## Trial  67  was  3976  steps.
## Trial  68  was  3988  steps.
## Trial  69  was  3947  steps.
## Trial  70  was  3949  steps.
## Trial  71  was  1084  steps.
## Trial  72  was  3218  steps.
## Trial  73  was  3940  steps.
## Trial  74  was  2697  steps.
## Trial  75  was  2760  steps.
## Trial  76  was  2952  steps.
## Trial  77  was  2171  steps.
## Trial  78  was  3566  steps.
## Trial  79  was  345  steps.
## Trial  80  was  3345  steps.
## Trial  81  was  3402  steps.
## Trial  82  was  3626  steps.
## Trial  83  was  5518  steps.
## Trial  84  was  3416  steps.
## Trial  85  was  4515  steps.
## Trial  86  was  4579  steps.
## Trial  87  was  4889  steps.
## Trial  88  was  4723  steps.
## Trial  89  was  3977  steps.
## Trial  90  was  3977  steps.
## Trial  91  was  3963  steps.
## Trial  92  was  3952  steps.
## Trial  93  was  3380  steps.
## Trial  94  was  4815  steps.
## Trial  95  was  3959  steps.
## Trial  96  was  3476  steps.
## Trial  97  was  3941  steps.
## Trial  98  was  3858  steps.
## Trial  99  was  3924  steps.
## Trial  100  was  3381  steps.
## Pole not balanced. Stopping after  100  failures.

Grid world

source("./grid_world5x5.R")

Figure 3.2: State-values with random policy

plot_fig3.2()

Figure 3.5: Optimal solutions to the gridworld example

plot_fig3.5()

Printing optimal policy

result_V_star <- compute_V_star(return_best_actions = T)
result_V_star$best_actions
##      [,1]  [,2]      [,3]  [,4]      [,5] 
## [1,] "R"   "U/D/L/R" "L"   "U/D/L/R" "L"  
## [2,] "U/R" "U"       "U/L" "L"       "L"  
## [3,] "U/R" "U"       "U/L" "U/L"     "U/L"
## [4,] "U/R" "U"       "U/L" "U/L"     "U/L"
## [5,] "U/R" "U"       "U/L" "U/L"     "U/L"