1.0 - Causes of Uncertainty: System Noise and Errors

Where does uncertainty come from?
- Control error or disturbances from external forces → Effect of performing an action is non-deterministic
- Errors in sensing and processing of sensing data → Imperfect observation about the world (partially observable systems)
- Strategic uncertainty and interaction with other agents → Game Theory (Module 5)
- Too complex to model
  - Lazy: Rolling a dice in a casino depends on the wind direction from air conditioning, number of people around the table, etc.
  - Deliberate: To reduce computational complexity. We want to eliminate variables that will not affect the solution significantly.
- Accidental error → Lack of understanding about the problem
- Abstraction error → The actual possible states are often too large. We simplify them so it's solvable by current computing power
  - One approach to simplification is to cluster several actual states together and assume all actual states in the same cluster are the same
  - Meaning: A state in our model corresponds to a set of actual states that are not differentiable by the program.
  - Similarly with the action space
  - Another approach is to use function approximations of the state or action policy, e.g. using basis functions or machine learning methods
- In both, the effect of performing an action becomes non-deterministic
- Usually we deal with bounded, quantifiable uncertainty.

2.0 - Assumptions on Environment in Module 3

Does the agent know the state of the world / itself exactly?
- Fully observable vs partially observable
Does an action map to one state into a single other state?
- Deterministic vs non-deterministic
Can the world change while the agent is "thinking"?
- Static vs
Are the actions and percepts discrete?
- Discrete vs continuous

3.0 - Review of Probability

3.1 - Applied Probability and Statistics

In this course, are are only interested in applied probability and statistics.
We will not cover the mathematics of probability theory and stochastic processes, statistics (derivations and proofs) or the design of experiments - this is not a statistics course.
R&N Chapter 12.1 - 12.5
P&M Chapter 8.1

3.2 - Probability Terminology

Experiment An occurrence with an uncertain outcome that we can observe, e.g. rolling a die
Outcome The result of an experiment; one particular state of the world.
Sample Space The set of all possible outcomes for the experiment. For example $\{1, 2, 3, 4, 5, 6\}$
Event A subset of possible outcomes that together have some property we are interested in. For example, the even "even die roll" is the set of outcomes $\{2, 4, 6\}$

3.3 - What is a Probability Distribution?

What do you think of when an event is described as "random"?
- An unexpected event?
- A uniform number between 0 and 1?
- A normally distributed random value?
A random variable, denoted $X$ , has an element of chance associated with its value
The level of chance associated with any particular value (or range of values), $X=x$ is called its probability $P(X=x)$ . This is a positive value between 0 and 1.
The collection of probabilities over values that a variable may take is called a distribution, with the property that the sum of probabilities of all mutually exclusive, collectively exhaustive events is $1$ .
The value of any function of a random variable is also a random variable.
- For example the sum of $n$ random variables takes a random value.
Very loosely [1], just about anything that you can count or measure, has non-negative values, and that sums to one over all outcomes is a probability distribution
Fundamentally, both discrete and continuous variables, $X$ , are represented by a cumulative distribution function $\text{cdf}$ , denoted $F(x)$
The $\text{cdf}$ is the probability that the realised value of $X$ is less than or equal to $x$

F(x)=P(X\le x)

[1] Take it on trust that there is a serious branch of mathematics behind this, regarding topological spaces and measure theory.

3.4 - What is a Probability Distribution

The terms used for probability that $X$ takes a particular value are different for discrete and continuous variables.
For discrete variables, a probability mass function ( $\text{pmf}$ ), $P(X=x)$ describes the chance of a particular event occurring
- For finite discrete-value variables, this is easy to understand as a finite vector of non-negative values that sum to one.
- For example, the chance of a coin toss, roll of dice, poker hands, etc.
- For countably infinite discrete variables, the probability distribution is a series of numbers over an infinite set of distinct elements.
For continuous variables, a $\text{probability density function}$ ( $\text{pdf}$ ) $f(x)$ is a continuous function that integrates from below to the cumulative density function.

$F(X\le x)=\int^x_{-\infty} f(y)\ dy$
Python will very effectively help you to handle probabilities.
Many distributions have functions for probabilities, random number generation etc.
The methods that you will learn about available through the python random module, which is part of the standard library.

3.4.1 - Sampling Random Values

For example, a normal distribution can be sampled using:

>> import random as r
>> mu, sigma = 2.0, 4.0
>> x = r.gauss(mu, sigma)
1.3927394833370967

We can also sample a Weibull distribution, given by:

$f(x|a, b)={b \over a}({x \over a})e^{-({x\over a})^k}$
```
>> a, b = 1.0, 1.5
>> r.weibullvariate(a,b)
1.9157188803236334
```
In this course, you don't have to remember functional forms, etc.
Just focus on understanding a distribution's use and parameters.

We can generate an integer from 0 to 100 inclusive:

>> randrange(0, 101, 2) # bottom (inclusive), top (exclusive), step size
26

We can also select a single random event from a sequence

>> options = ['win', 'lose', 'draw']
>> r.choices(options)
'draw'

We can also select a single element according to relative weights

>> options = ['win', 'lose', 'draw']
>> weights = [5, 10, 15]
>> r.choices(options, weights)
'lose'

3.5 - Conditional Probability and Independence

Conditional probability is a measure of the probability of an event given that another event has already occurred.
If the event of interest is A and the event B is known to have occurred, then the corresponding conditional probability of A given B is denoted $P(A|B)$
If two events A and B are independent, then the probability of both occurring is:

$P(A\cap B)=P(A)P(B)$
Otherwise, if the events are dependent

$P(A\cap B)=P(B)P(A|B)=P(B)P(B|A)$
In both cases, the probability events A or B occurring is:

$P(A\cup B) = P(A)+P(B)-P(A\cap B)$
Bayes' rule rearranges the conditional probability relationships to describe the probability of an event, given prior knowledge of related events.

$P(A|B)={P(B|A)P(B)\over P(B)}$
For example, knowing symptoms of a disease is easier than figuring out the disease given symptoms

4.0 - Search Under Uncertainty - AND-OR Trees

We want to find a plan that works regardless of what outcomes actually occur
Can no longer rely on a sequence of actions
Need a conditional plan. The action to perform depends on the output of the previous action
Need a different type of tree data structure

4.1 - AND-OR Search Tree

A tree with interleaving AND and OR levels
At each node of an OR level, branching is introduced by the agent's own choice
At each node of an AND level, branching is introduced by the environment

4.1.1 - Slippery Robot Vaccum

States: Conjunctions of the following state factors
- Robot Positions: $\{\text{in } R_1, \text{in } R_2\}$
- $R_1$ state: $\{\text{clean, dirty}\}$
- $R_2$ state: $\{\text{clean, dirty}\}$
Action: $\{\text{Left, Right, Suck}(R_1), \text{Suck}(R_2)\}$
World Dynamics: Non deterministic, after performing an action state, the robot may end up in one of several possible states
- Successors of $(\text{Robot in }R_1, \text{Right})=\{{\text{Robot in }R_1, \text{Robot in }R_2}\}$
- Successors of $(\text{Robot in }R_2, \text{Left})=\{{\text{Robot in }R_1, \text{Robot in }R_2}\}$
- (The rooms are slippery, the robot might not move into the other room
Initial State $(\text{Robot in }R_1)\wedge(R_1\text{ is clean})\wedge(R_2\text{ is dirty})$
Goal state $(R_1 \text{ is clean})\wedge(R_2 \text{ is clean})$

4.1.2 - And-Or tree of the Slippery Vacuum Robot

Note that the arc across the outcomes (as highlighted in green) indicates that there is a probability of landing in the various states.
The non-determinism of the environment is modelled in the AND layer
The choice of state is deterministic when the agent makes a decision

4.1.3 - AND-OR Search Tree

A solution in an AND-OR tree is a sub-tree that:
- Has a goal node at every leaf
- Specifies one action at each node of an OR level
- Includes every outcome branch at each node of an AND level
When do we have a solution?

4.1.4 - Labelling an AND-OR Tree

Notice that An action node is closed if at least one of its children is closed
- This is as we might end up in an invalid state as a result of the conditional probabilities.
- We can propagate up the solution to the parent nodes
  - Both leaf nodes must be solved for the solution to propagate upward.

In this subtree, it doesn't matter that the AND node isn't closed - we can choose not to take that branch as our actions in this aspect of the world is deterministic
Keep labelling until the root to determine a solution path

What happens when a node is the same as an ancestor node? A loop in the and-or tree?
- We can keep doing the action again and determine if we can get to another state as a result of the environment's non-determinism

Searching an And-Or Tree

Start from a state node (at the OR level)
- Fringe nodes are the state nodes
Using any of the search algorithms we have studied:
- Select a fringe node to expand
- Select an action to use
- Insert the corresponding action node
- Insert all possible outcomes of the action as the child of the action node
- Back up to re-label the ancestor nodes
Cost/reward calculation at AND level
- Weighted sum (when uncertainty is quantified using probabilities, expectation)
- Taking the maximum cost / minimum reward (conservative)

Decision Theory

Modelling decision making under uncertainty.

Preferences

Actions result in outcoems
Agents have preferences over outcomes
A rational agent will do the action that has the best outcome for them
Sometimes agents don't know the outcomes of the actions, but they still need to compare actions
Agents have to act (Doing nothing is often an action)

Preferences over Outcomes

Some notation:
- The preference relation $\succ$ means is preferred to or succeeds in a preference relation
- $\prec$ is precedes in a preference order or is not preferred to or is preferred less than
- $\sim$ is indifference
If $o_1$ and $o_2$ are outcomes
- $o_1 \preceq o_2$ means that $o_1$ is at least as desirable as $o_2$
- $o_1 \sim o_2$ means $o_1 \preceq o_2$ and $o_1 \succeq o_2$
- $o_1\succ o_2$ means $o_1\succeq o_2$ and $o_2\not\succeq o_1$

Lotteries

An agent may not know the outcomes of its actions, but only have a probability distribution of the outcomes
lottery is a probability distribution over the outcomes:
- P&N denote this $[p_1, o_1:p_2,o_2:,...:p_k,k]$
- P&M denote this $[p_1:o_1,p_2:o_2,...,p_k:k]$
  - where the $i$ are outcomes and $p_i\ge0$ such that $\sum_i p_i=1$
The lottery specifies that outcome $i$ occurs with probability $p_i$
Alternatively, an agent may choose to select an action using a lottery
When we talk about outcomes, we include lotteries over "pure" outcomes (where pure outcomes are fully defined and either happen or not)

Axioms of Rational Preferences

Idea: Preferences of a rational agent must obey certain rules
Rational preferences imply behaviour describable as maximisation of expected utility
Completeness / Orderability $(o_1\succ o_2)\vee(o_2\prec o_1)\vee(o_1\sim o_2)$
- Agents have to act, so they must have preferences over their choices
$\forall o_1,\forall o_2, o1\succeq o2 \text{ or } o_2 \succeq o_1$
Transitivity $(o_1 \succ o_2) \wedge (o_2 \succ C)\Rightarrow (o_1 \succ C)$
- $\text{if }o_1\succeq o_2 \text{ and } o_2\succ o_3 \text{ then } o_1\succ o_3$
- Rationale: Otherwise we could have cyclic preferences with no solution
Monotonicity $o_1\succ o_2 \Rightarrow (p\ge q \Leftrightarrow [p: o_1, 1-p:o_2]\succeq[q:o_1,1-q:o_2])$
- An agent prefers a larger chance of getting a better outcome than a smaller chance:
- If $o_1 \succ o_2$ and $p>q$ then (Given two pure outcomes $o_1$ and $o_2$ , construct a lottery.
$[p:o_1,1-p:o_2]\succ[q:o_1, 1-q: o_2]$
- The probability of outcome 1 occurring in the first lottery is greater than the probability in the second lottery, (and likewise, the probability of outcome 2 occurring is greater in the second lottery as compared to the probability of outcome 2 occurring in the first lottery)
- Therefore, we prefer lottery 1 over lottery 2 to maximise the chances of outcome 1 occurring
Continuity $o_1\succ o_2\succ C \Rightarrow \exists p \in [0,1][p:o_1, 1-p:C]\sim o_2$
- At some point, we favour a different outcome
  
  Suppose $o_1\succ o_2$ and $o_2 \succ o_3$ . Consider a situation where the agent is trying to choose between $o_2$ and the lottery $[p:o_1,1-p:o_3]$ for values of $p\in[0,1]$ . For the lottery, $P(o_1)=p, P(o_3)=1-p$
  
  When $p=0$ , we are effectively comparing $o_2$ to $o_3$ - we know that $o_2\succ o_3$ ( $o_2$ is preferred to $o_3$ ). When $p=1$ , the chance of getting $o_1$ in the lottery is $100\%$ - we prefer the lottery

Substitutability $o_1\sim o_2 \Rightarrow[p:o_1, 1-p:{\color{#999999}C}]\sim[p:o_2, 1-p:{\color{#999999}C}]$
- If $o_1\sim o_2$ then the agent is indifferent between lotteries that only differ by $o_1$ and $o_2$
- We are indifferent about the decisions between $o_1$ and $o_2$ , and the outcome $C$ is irrelevant to the expression/property. Alternative Axiom for Substitutability
- If $o_1\succeq o_2$ then the agent weakly prefers lotteries that contain $o_1$ instead of $o_2$ , everything else being equal.
- That is, for any number $p$ and outcome $o_3$
$[p:{\color{#AAA}o_1},(1-p):o_3]\succeq[p:{\color{#AAA}o_2}, (1-p):o_3]$
Decompostability (no fun in gambling): An agent is indifferent between lotteries that have the same probabilities and outcomes

[p:o_1, 1-p:[q:o_2,1-q:o_3]]\sim[p:-o_1,(1-p)q:o_2,(1-p)(1-q):o_3]

Really just looking at final probabilities and possible outcomes, doesn't matter how they are structured.

What we would like

We would like a measure of preference that can be combined with probabilities, so that:

$\text{value}([p:o_1,1-p:o_2])=p\times\text{value}(o_1)+(1-p)\times\text{value}(o_2)$
Money doesn't act like this - what would you prefer?
- $1,000,000 guaranteed or 50/50 chance of getting $2,000,000
$\$1,000,000\text{ or } [0.5:\$0, 0.5:\$2,000,000]$
It may seem that preferences are too complex and multi-faceted to be represented by single numbers

Theorem

If preferences follow the rationality properties, then preferences can be measured by a function $\text{utility}$

$\text{utility:outcomes}\rightarrow[0,1]$
such that:
- $o_1\succ o_2$ iff $\text{utility}(o_1)\ge\text{utility}(o_2)$
- $o_1 \sim o_2$ iff $\text{utility}(o_1) = \text{utility}(o_2)$
Utilities are linear with probabilities:

$\text{utility}([p_1:o_1,p_2:o_2, ..., p_k:o_k]) =\sum^k_{i=1} p_i\times\text{utility}(o_i)$
So, we can replace preferences with real numbers

Maximum Expected Utility

Utility: A number that assigns the desirability of a state, MEU (Maximum Expected Utility) is the commonly used definition of the 'best' decision
Idea - Assigns utility function to each outcome (state) to represent the agent's preference
'Best' decision maximises the expected utility of the outcomes.

MEU Example: Buying a Car

Goal: Buy a car, and sell it for profit. Cars cost $1000, and we can sell them for $1100 meaning a $100 profit

Every car is either "Good" or "Bad"

It costs $40 to repair a good car

It costs $200 repair a bad car

20% of cars are bad

Solve using MEU

State Space: $\{\text{Good Car, Bad Car}\}$

Preference: $\text{Good Car}\succ\text{Bad Car}$

Utility Function:

$U(\text{Good Car})=1100-1000-40=60$

$U(\text{Bad Car})=1100-1000-200=-100$

Lottery: $[0.8: \text{Good Car}, 0.2: \text{Bad Car}]$

Expected Utility:

$P(\text{Good Car})\times U(\text{Good Car})+P(\text{Bad Car})\times U(\text{Bad Car})$

$=0.8*60 +0.2\times100=28$

Since the expected utility is positive, we should buy the cars!

MEU Example: Utility of Money?

Which do you prefer?

A sure gain of $250

A 25% chance of winning $1000 (and a 75% chance of winning nothing)

What about:

A sure loss of $750

a 75% chance of losing $1000 and a 25% chance of losing nothing.

Decision theory still works - we just need a better utility function

Utility as a function of money
We can achieve this by using various "risk preference curves"

Factored Representations of Utility

Suppose the outcomes can be described in terms of features $X_1, ..., X_n$
An Additive Utility is one that can be decomposed into a set of factors

$u(X_1,...,X_n)=f_1(X_1)+...+f_n(X_n)$
(This assumes additive independence)
Strong assumption: Contribution of each feature doesn't depend on other features
Many ways to represent the same utility:
- A number can be added to one factor as long as it is subtracted from others.
An Additive Utility has a canonical representation:

$u(X_1,...,X_n)=w_1\times u_1(X_1)+...+w_n\times u_n(X_n)$
If $\color{#FFA500}best_i$ is the best value of $X_1, u_i(X_i={\color{#FFA500}best_i})=1$
If $\color{#FFA500}worst_i$ is the worst value of $X_i, u_i(X_i={\color{#FFA500}worst_i})=0$
The elements $\color{#FFA500} w_i$ are weights, such that $\sum_iw_i=1$ - the weights reflect the relative importanced of features
We can determine weights by comparing the outcomes

$w_1=u(best_i,x_2,...,x_n)-u(worst_i,x_2,...,x_n)\ \forall \ x_i\in X_i$

Complements and Substitutes

Often additive independence is not a good assumption
Values $x_i$ of feature $X_i$ and $x_2$of feature $X_2$ are complements if having both is better than the sum of the two
Values $x_i$ of feature $X_i$ and $x_2$of feature $X_2$ are substitutes if having both is worse than the sum of the two

Generalized Additive Utility

A generalized additive utility can be written as the sum of factors

$u(X_1,...,X_n)=f_1(\overline{X_1})+...+f_k(\overline{X_k}), \ \ \ \ \ \overline{X_i}\subseteq\{X_1,...X_n\}$
An intuitive canonical representation is difficult to find
It can represent complements and substitutes