Background → Markov Chains

Markov Chains are systems with:
- Discrete time steps
- Discrete states
- Predictable, stochastic state transitions (underlying probability distribution that govern how the agent transitions from one state to another
Applications of Markov Chains include activity models, target tracking, regression models with mode-switching or reigeme-switching, simulating wind and weather patterns and finance.
In Markov Chains, states are circles, transitions are edges and weights are probabilities.
No decision-making, purely probabilistic chances of ending up in another state.

The probability moving from state $s$ to state $s'$ is given by $P(s'|s)$
Edges represent transitions with $\gt0$ probability
Transitions depend on current state only, not on the full history - this is called the Markovian property. (System is memoryless)
- State at time $t$ only depends on state at time $t-1$
State distribution $x_t\rightarrow$ probability distribution indicating likelihood of being in each state ate time $t$
State distribution can be represented as a row vector; when state is known (e.g. at $t=0$ ) this vector is one-hot (one element in vector is 1, else 0)
- P = 1 in the state that we know the agent is in, and 0 elsewhere.
This could be a fully-connected graph - where you can go from one state to another state in just a single step
- To make this graph more readable, we avoid this.
- Require this for when we create a matrix to represent the Markov chains

Markov Chains are ergodic if:
- Each state can be reached from any other state via some path - fully connected (irreducible)
- No periodic cycles - e.g. not a bipartite graph (aperiodic)
Ergodic → There exists some T such that for $t>T$ every state has a probability $\gt 0$
The graph shown is not ergodic as it violates the irreducible property.

Markov Chains as Matrices

State transition probabilities can be represented by a matrix

$P=\scriptsize\begin{bmatrix}0.5&{\color{FF7369}0.5}&0\\0.2&0.5&0.3\\0&0.5&0.5\end{bmatrix}$
- The size of the matrix is $|S|\times|S|$ where $|S|$ is the number of states in the state space.
- The probability denoted in red is the probability of transitioning from state 1 to state 2.
The state distribution at time step $k$ is given by:

$x_k=x_oP^k$
- Probability of being in a given state at time step $k$ is given by multiplying our one-hot vector with the matrix raised to the $k$th power.
Stationary distribution → A distribution $x_s$ such that $x_{s+1}=x_s$ , i.e. $x_s=x_sP$
- That is, enough time has passed (i.e. sufficiently large value of $s$ ) such that the probability distributions stabilise.
- P is the transition dynamics matrix defined earlier.
Over time (as t increases), $x_t$ tends towards $x_s$
Can approximate stationary distribution by computing $x_k$ for increasing values of $k$ until $|x_{k+1}-x_k|\lt\epsilon$
- Obtain the stationary distribution by continuing to iterate until the values stabilise
- Stabilise → no discernible difference between $x_{k+1}$ and $x_k$ (that is what $\epsilon$represents).
This will be foundational for our techniques for solving stochastic problems.

Exercise 6.1 - Restless Robot

Robot has no AI. At each time step, it either moves left or right (each with at 50% chance) Moving into an edge (e.g. moving left at state 0) causes the robot to stay in the same state. The robot starts in state 2. This system can be modelled as a Markov Chain.

A grid world with a single axis of movement - LEFT and RIGHT.
Robot has no AI → no decision-making capability, but transitions between states using a simple model.

**a)** What are the dimensions of the transition matrix? **b)** Construct the transition matrix in code

The size of the transition matrix is determined by the number of states.

$\scriptsize s\in\{0, 1, 2, 3, 4, 5\}\\s'\in\{0, 1, 2, 3, 4, 5\}$
The transition matrix has size $|S|\times|S|=6\times6$
The transition matrix is given as follows:

p = np.array([
	[0.5, 0.5, 0.0, 0.0, 0.0, 0.0],
	[0.5, 0.0, 0.5, 0.0, 0.0, 0.0],
	[0.0, 0.5, 0.0, 0.5, 0.0, 0.0],
	[0.0, 0.0, 0.5, 0.0, 0.5, 0.0],
	[0.0, 0.0, 0.0, 0.5, 0.0, 0.5],
	[0.0, 0.0, 0.0, 0.0, 0.5, 0.5]
])

\begin{bmatrix} 0.5&0.5&0&0&0&0\\ 0.5&0&0.5&0&0&0\\ 0&0.5&0&0.5&0&0\\ 0&0&0.5&0&0.5&0\\ 0&0&0&0.5&0&0.5\\ 0&0&0&0&0.5&0.5\\ \end{bmatrix}

c) Give the initial state vector for starting at state 2, and compute the state distribution at $t=1$

# Initial state distribution at time t=0. This is our one-hot vector
x0 = np.array([0, 0, 1, 0, 0, 0])
# matrix multiplication instead of element multiplication
x1 = np.matmul(x0, p)
x1 = [0.0, 0.5, 0.0, 0.5, 0.0, 0.0]

d) Compute the state distribution for t=2, t=4, t=10, t=20

x2 = np.matmul(x0, np.linalg.matrix_power(p, 2))
x2: [0.25, 0.00, 0.50, 0.00, 0.25, 0.00]

x4 = np.matmul(x0, np.linalg.matrix_power(p, 4))
x4: [0.2500, 0.0625, 0.3750, 0.00, 0.250, 0.0625]

x10 = np.matmul(x0, np.linalg.matrix_power(p, 10))
x10: [0.2060, 0.1269, 0.2460, 0.878, 0.2060, 0.1269]

x20 = np.matmul(x0, np.linalg.matrix_power(p, 20))
x20: [0.1760, 0.1572, 0.1854, 0.1478, 0.1760, 0.1572]

From this, we can see that the value of x converges as it grows.
- The difference from t=10 to t=20 is significantly less than the difference from t=1 to t=2 or t=2 to t=4

Round UP to closest 15-minute interval.
The solution is found when the root can be labelled as "solved" → propagating the 'solution' label up to the root.

Parts of the AND-OR Tree

Situations where a choice is made (e.g. choosing which road to follow at an intersection) are represented by OR nodes.
- Path taken chosen by the user / agent
- To solve, there must be at least one path which can be chosen which is solved (guaranteed to reach the goal within the given duration)
Random outcomes (e.g. different amounts of time taken to reach the end of a road due to traffic on that road) and represented by AND nodes
- Path taken chosen by world dynamics / environment.
- To solve, every path (including the worst possible outcome/worst possible traffic) must be solved
Solving the tree provides a worst-case guarantee.
Another possible way of representing the tree structure's average time to solution would be to take the weighted average of the solution time (weighted by the probability of going down that branch).
If traffic is better than the worst case, it may still be possible to reach the goal within the time duration
- Not guaranteed to make it to the goal in time
- Important consideration for safety-critical systems

Exercise 6.3 - Car Rental

Choose the most rational (i.e. highest expected utility) opinion out of the following:
1. Buy x5 GenCar for $40,000 each. Guaranteed rental with $175 income, $25 in expenses per day per car, for 330 days of the year. Not rented for 35 days of the year, with $30 expenses per day
2. Buy x2 Tesla for $120,000 each. 75% chance of rental with $500 income, $10 in expenses per day per car, and 25% chance of not being rented in $5 in expenses per day per car for 330 days of the year. Not rented for 35 days of the year, with $30 expenses per day
3. Buy x2 upgraded Tesla for $140,000 each with $600 in rental income, otherwise the same as the base Tesla.
Solved by constructing a rational decision tree
- Related to the AND-OR tree
Want to find the highest effective utility
- Found by sum of product x utility for each possible outcome

Option One

$𝔼(\text{5 GenCars})=5\times(330(\$175-\$25)-25\times\$30-\$40,000)=5\times\$8,450\\=\$42,250$
Option Two

$𝔼(\text{2 Teslas})=2\times (0.75(330(\$500-\$10)-35\times(\$30+\$5))+0.25(330(-\$5)-35\times(\$30+\$5))-\$120,000)=\$60,712.5-\$61,437.5\\=-\$725$
Option Three

$𝔼(\text{2 +Teslas})=-725+2(0.75(330\times\$100-\$20,000)+0.25(330(-\$5)-35\times(\$30+5))-\$120,000)=-\$725+\$9,500\\=\$8,775$

From these computations, it is the most rational choice to buy the 5 GenCars.
To solve this problem, we can also use an AND-OR tree
- Choosing which car type to buy - OR node
- Random outcome for how often the car is rented - AND node
- AND node - expected value = sum of ( $\text{utility}\times\text{probability}$ ) for all child nodes
- OR node - maximum value over all child nodes
Useful for larger problems, or when there are multiple decision-making stages.

Background → Markov Chains

Markov Chains as Matrices

Exercise 6.1 - Restless Robot

Exercise 6.2 - Navigation Agent

Exercise 6.3 - Car Rental