Random walk

source("./random_walk.R")

Example 6.2 Random Walk

Values learned after various numbers of episodes on a single run of TD(0)

example6.2.1()

Learning curves for the two methods for various values of alpha

example6.2.2()

Figure 6.2: Performance of TD(0) and constant-alpha MC under batch training

plot_fig6.2()

Maximization bias example

source("./maximization_bias.R")

Figure 6.5: Comparison of Q-learning and Double Q-learning on a simple episodic MDP

plot_fig6.5()