Lecture 10 - Learning to Act II - SARSA and VFA for Reinforcement Learning

1.0 - Recap of Reinforcement Learning

1.2 - Monte-Carlo Reinforcement Learning

1.2.1 - Monte-Carlo Update

1.2.2 - Monte Carlo Reinforcement Learning

1.2.3 - Monte Carlo Value Updating

1.2 - Temporal Difference Learning

1.3 - Monte Carlo vs Temporal Difference Updating

  1. Incremental Every-Visit MC

  2. Simplest Temporal-Difference Learning Algorithm TD(0)

1.3.1 - Sutton & Barto - Monte Carlo vs Temporal Difference Learning

The following was taken from p146 of Sutton & Barto - Reinforcement Learning - An Introduction Link

Left: Changes recommended by monte carlo methods, based on actual outcomes. Right:Changes recommended by Temporal Difference methods (based on observations in successive states)

1.3.2 - Multi-Step TD Learning (TD(λ\lambda))

1.4 - Q-Learning

1.4.1 - TD Learning

🧠 Q-Learning is a type of TD-Learning

1.4.2 - Properties of Q-Learning

1.4.3 - Problems with Q-Learning

1.5 - SARSA

🧠 Incorporate the exploration strategy with SARSA

1.5.1 - On-Policy Learning

1.5.2 - SARSA Pseudocode

In this, use a MAB or Epsilon-Greedy policy to balance exploration and exploitation

1.6 - Q-Learning vs SARSA

2.0 - Value Function Approximation (VFA) for RL

2.1 - Large-Scale Reinforcement Learning

2.3 - Q-Learning

2.3.1 - Pacman Example

2.4 - Linear VFA

2.4.1 - SARSA with Linear VFA

2.4.2 - Advantages and Disadvantages of VFA

Advantages

Disadvantages

2.5 - Function Approximation

2.5.1 - Types of Function Approximators

2.5.2 - Gradient Descent

Visualisation of Gradient Descent

2.5.3 - Value Function Approximation by SGD

2.6 - Representing States using Feature Vectors

2.6.1 - Table Lookup Features

2.7 - Incremental Prediction Algorithms

2.8 - Monte Carlo with Value Function Approximation

2.9 - TD Learning with Value Function Approximation

S1,R2+γv^(S2,w),S2,R3+γv^(S3,w),...,ST,R4+γv^(S4,w) \lang S_1, R_2+\gamma\hat{v}(S_2, \bold{w})\rang, \lang S_2, R_3+\gamma\hat{v}(S_3, \bold{w})\rangle,..., \langle S_T,R_4+\gamma\hat{v}(S_4, \bold{w}) \rangle

2.10 - Control with VFA

2.11 - Q-Value (Action-Value) Function Approximation

2.12 - Linear Q-Value (Action-Value) Function Approximation

x(s,a)=(x1(S,A)xn(S,A)) \scriptsize x(s,a)=\begin{pmatrix}x_1(S,A)\\\vdots\\x_n(S,A)\end{pmatrix}