Short corridor with switched actions

source("./short_corridor.R")

Value of the first state over the probability of right action

example13.1()

Figure 13.1: REINFORCE

plot_fig13.1()

Figure 13.2: Adding a baseline to REINFORCE can make it learn much faster

plot_fig13.2()