1.0 - Tutorial 8

MCTS - Selection

MCTS - Expansion

MCTS - Simulation

MCTS - Backpropagation

Rt=r(st)+γ1r(st+1)++γn(st+n)+γn+1VR_t=r(s_t) + \gamma^1 r(s_{t+1}) + \cdots + \gamma^n(s_{t+n}) + \gamma^{n+1}V

Q(st,at)N(st)Q(st,at)+RtN(st)+1N(st,at)N(st,at)+1,N(st)N(st)+1 \begin{align*} Q(s_t, a_t)&\leftarrow \frac {N(s_t) Q(s_t,a_t) + R_t}{N(s_t)+1} \\ N(s_t,a_t)&\leftarrow N(s_t, a_t) + 1, N(s_t) &\leftarrow N(s_t)+1 \end{align*}

MCTS Implementation - Iterative

MCTS Implementation - Recursive

For both approaches, the mcts_select_action should: