The 10-armed testbed

source("./ten_armed_testbed.R")

Figure 2.1: Reward distribution of each action

plot_fig2.1()

Figure 2.2: Average performance of epsilon-greedy action-value methods

Average reward

plot_fig2.2.1()

Percentage of optimal action

plot_fig2.2.2()

Figure 2.3: The effect of optimistic initial action-value estimates

plot_fig2.3()

Figure 2.4: Average performance of UCB action selection

plot_fig2.4()

Figure 2.5: Average performance of the gradient bandit algorithm

plot_fig2.5()

Figure 2.6: A parameter study of the various bandit algorithms

plot_fig2.6()