1.0 - Theory

The following section is a summary of the content presented in the tutorial.

1.1 - Agent Design Problem Definitions

Action Space (A) The set of all possible actions the agent can perform (sometimes called the action set in the discrete case). An action is denoted $a \in A.$
Percept Space (P) The set of all possible things an agent can perceive
State Space (S) The set of all possible configurations of the world the agent is operating in (sometimes called the set of states in discrete state systems). A state is denoted $s \in S$ .
World Dynamics / Transfer Function ( $T: S \times A\rightarrow S')$ A function that specifies how the world changes when the agent performs actions in it; a system model. We sometimes write $T(s,a) = s'$ . Maps a tuple of states and actions to a single resulting state.
Percept Function ( $\Z: S \rightarrow P)$ A function that maps a world state to a perception. For fully observable problems, the percept function is the identity function $\Iota(x)$ .
Utility Function ( $U:S \rightarrow \R$ ) A function that maps a state (or sequence of states) to a real number, indicating how desirable it is for an agent to occupy that state / sequence of states. We sometimes write $U(s) =$ some cost or reward.
~~Action Space (A)~~ The set of all possible actions the agent can perform (sometimes called the action set in the discrete case). An action is denoted $a \in A.$
~~Percept Space (P)~~ The set of all possible things an agent can perceive in a time step. E.g. the set of all possible 100 x 100 RGB images. In a fully observable system, Percept Space = State Space.
~~State Space (S)~~ The set of all possible configurations of the world the agent is operating in (sometimes called the set of states in discrete state systems). A state is denoted $s \in S$
~~Percept Function~~ ( $\Z: S \rightarrow P)$ A function that maps a world state to a perception. There could be multiple states that map to the same percept (not injective, could be ambiguity). In a fully observable function, the Percept Function is the Identity Function I(x) = x
~~Utility Function~~ ( $U:S \rightarrow \R$ ) A function that maps a state (or sequence of states) to a real number, indicating how desirable it is for an agent to occupy that state / sequence of states. We sometimes write $U(s) =$ some cost or reward.' Can be framed in terms of positive or negative case.

1.1.1 - Why the Agent Design Problem?

We want to avoid having to solve the same problems over and over again
We want algorithms which can be applied to broad classes of problems rather than just a single problem
We want to describe a problem once and allow the use of multiple solving algorithms (e.g. to evaluate algorithm performance - which algorithm is more accurate / faster?)
Need a standard interface between the set of problems and set of solving algorithms
Establish assumptions about the problem and make them explicit.
- Solutions to AI problems are highly dependent on assumptions - we very rarely get all of the assumptions right at the start of the project, but this helps.

1.2 - Search Problem

Given that the Agent Design Problem is defined, we just need to define the Initial State and the Goal State to fully define the Search Problem.

1.3 - State Graph Representation

A State Graph Representation is a way to represent a search program concretely in a program
Another way of thinking about the problem
A State Graph is used in problems with continuous or very large state spaces to compactly represent the state space
Formally, a state graph $G=(V,E)$ comprises of:
- Vertices (V) representing states, and
- Edges (E) representing world dynamics
Each edge $\overline {ss'} \in E$ is labelled with the cost to move from $s$ to $s'$ .
It may also be labelled by the action to move from state $s$ to $s'$ .
- The initial and goal states are mapped to the initial and goal vertices of the graph.
A solution is a path from the initial to goal vertices in the state graph.
Cost The sum of the cost associated with each edge of the path.
- The optimal solution is the shortest (lowest cost) path through the state graph.

2.0 - Exercises

2.1 - Exercise 1.1

Design a tic-tac-toe or noughts-and-crosses playing agent, using the design components listed above. Assume that a single time step includes a single move by the agent and the immediate move by the opponent. The goal is to win with as few steps as possible.

Given Assumptions

A single time step includes a single move by the agent and the immediate move by the opponent. (Since we need to be able to reframe this as a single-agent decision making problem, we represent the behaviour of the opponent in the world dynamics);.
The goal is to win with as few steps as possible.

Other Assumptions

Does the agent always make the first move or the second move?
Does the agent play as X or O? (This has implications on the action space)
How does the opponent behave? Perfectly rational opponent?
Is a draw preferable to a loss? Losing in a greater number of steps preferable to losing in fewer steps.

Agent Design Components

State Space The State Space must allow for all possible states to be denoted.

$S = \{(t_1, t_2, t_3, t_4, t_5, t_6, t_7, t_8, t_9) \space | \space t_i \in [X, O, \_]\}$

Note that not all combinations that can be represented in the state space can result from valid gameplay. For example, in this representation, the grid can be filled with all X or O which is not possible in a game. A more compact representation (with less possibilities, more strongly enforced constraints) is possible, but this representation has good readability.

We don't need to enumerate over the entire state space (that is, determine every possible permutation of the state space - there are $3^9=19683$ possibilities). We can build it up as we need.

Action Space The set of all possible actions that the player can perform.

Placing the player's symbol in any of the available squares (tiles with the _ character).

$A=\{p_i \space | \space i \in [1, 2, ..., 9] \}$ where $p_i$ is the the action to place the player's symbol in tile $t_i$

Not every action is valid in every state - can't place in an already occupied tile. This means that the action space is limited by which tiles have already been populated.

World Dynamics

Upon the agent's move → When the agent performs an action $p_i$ , tile $t_i$ is updated with the agent's symbol.

Upon the opponent's move → One of the available tiles is filled with the opponent's symbol (opponent's choice of tile depends on problem assumptions)

There is an assumption that the opponent is going to behave with perfect rationality.

Not completely deterministic as there are multiple possible squares that the opponent could choose between.

If the agent is playing against a human opponent, we need to model how the opponent will play (not deterministic, or consistent).

Complicated to describe in formal mathematical notation, describe using words instead.

Utility Function

Winning in the smallest number of moves (i.e. with the smallest number of occupied tiles placed

One possible utility function is defined as:

$\begin{equation*} U(s)=\begin{cases} 0, \quad &\text{if no 3-in-a-row cases} \, \\ 10-numOccupied, \quad &\text{if agent has 3-in-a-row}\\ numOccupied-10, \quad &\text{if opponent has 3-in-a-row} \end{cases} \end{equation*}$
Note that if $U(s) = 0$ then that means that the game is still on

The $numOccupied-10$ case handles the assumption that the best possible loss is one with the most number of moves.

The number 10 is intentionally chosen as the maximum number of moves is 9

Winning in the maximum number of moves yields $U(s)=1$ , which is better than in the draw case ( $U(s) = 0$ )

Any win is better than a draw

Any win with less moves is better than a win with more moves.

Percept Space / Percept Function

Is the environment fully observable or partially observable? The game is fully observable as the agent is able to observe the exact state at every time step.

Therefore, percept space = state space
```
          percept function = identity function
```

2.2 - Exercise 1.2

Consider a navigation app, like an app on your smartphone or car that you use to find your way around UQ or other places. This program is essentially a rational agent. Assume that: 1. Its goal is to find the shortest path to a goal location 2. The map used by the agent is 100% up to date 3. The location provided by the GPS is up to date
a) How will you design it? Use the design components listed earlier. b) Select the type of environment this agent operates in (i.e. discrete/continuous, deterministic/non-deterministic, fully/partially observable, static/dynamic). Explain your selections, and think of the effect of each assumption above to this type c) Define the search problem and its corresponding state graph representation for this query.

About This Question

This question is a lot more open-ended than the previous question - intended to show how assumptions change the solution
Shortest path (distance) vs shortest (time) - What if there is a road with a higher speed limit?

Assumptions and Details

The map used by the agent is 100% up to date → All information about traversable edges is correct
The location provided by the GPS is correct → (Since the environment is fully observable) → Current State is always correct

Other Details

What is the definition of a state? This depends on the use case of the navigation app
- Driving navigation → Street address
- Walking navigation → Landmarks?
- General GPS Nav → Latitude and Longitude
  - For this to make sense, the navigation should not be constrained by roads (e.g. navigation for boats and planes)
How precise does the navigation need to be?
- Can we use landmarks / streets / street addresses
Does the user always follow the directions correctly? Deviate from the path? Change their mind half way through?
Do we need to account for traffic and pedestrians?
Agent Design Components

State Space

The set of all valid street addresses / set of all landmarks / set of all (latitude, longitude) combinations $\{(x,y) \space | \space x,y \in \R\}$

What about discretisation (resolution) of our navigation system / map / GPS?

Action Space - Set of all actions (navigation) that can be performed by the agent

All legal driving manoeuvres

Movement and heading (with constraints for minimum and maximum value) for moving in 2d plane

World Dynamics

Position changes to the next node in the direction the agent selected (if we assume that the user always follows the instruction of the agent).

How does this change if the user doesn't necessarily follow the instruction of the agent / application entity.

Utility Function

Reach the goal state with the minimum distance travelled

$Utility=DistanceTravelled \times -1$
Convert the cost to a utility function by multiplying by -1

Location provided by the GPS is correct (and gives enough resolution to fully determine the state of the agent) → Percept exactly reveals the agent's state, so this is a fully observable problem.

Percept Space = State Space

Percept Function = Identity Function
Environment Type

Discrete vs Continuous
- Discrete → Finitely many different state
- Continuous → If state A and state B are valid state, then state C is perfectly in between state A and B is also a valid state e.g. street addresses/landmarks are discrete, latitude and longitude may be continuous (if not discretised)
Deterministic vs Non-Deterministic
- Deterministic if the driver/user always follows the directions correctly
- Non-deterministic if driver/user makes a mistake or chooses not to follow directions
Fully vs Partially Observable
- GPS fully reveals the state (assuming adequate precision)
- Information about traffic and pedestrians is not observed - what are our assumptions?
Environment Type (Static vs Dynamic)
- Episode - everything that happens between initial state and reaching goal state Map is 100% up to date, so connectivity (e.g. road connections) do not change during episode → Static environment Traffic, pedestrians, etc. may change during an episode, so environment becomes dynamic if these are considered → Re-planning may be necessary if we are considering the shortest travel time (we are considering shortest movement distance)
Search Problem & State Graph Representation

Initial State: User's current location Goal State: User's desired location

Landmarks are vertices, paths between landmarks are edges Street addresses and intersections are vertices road sections are edges, street addresses always have 1 or 2 connections, intersection may have more Graph with grid structure, where each node has a latitude / longitude corresponding to row and column in the grid. → If continuous, complete state graph is impossible as there are $\infty$ vertices.

2.3 - Exercise 1.3

A web crawler is a program that systematically browses and downloads web pages from the internet. This is one of the programs that enables us to search the internet. A web crawler can be viewed as a rational agent. Please design a web crawler agent when the agent lives in a) An idea world where no broken links exist and the internet connection always works b) The real world, where both assumptions above are not valid

Agent Design Components → State Space: Set of all valid web addresses / URLs This may be hard to represent mathematically, need to build up as we go. Possible next state depends on the number of links on the current page. Impossible to enumerate over each possible sequence of characters that could for a URL. → Action Space: Set of all links which can be followed (changes depending on the current state) → World Dynamics: State changes to the URL of the selected link; in the non-ideal case, the link may be broken or the internet connection may be unavailable, causing the state to return to the previous webpage → Utility function: Number of unique webpages visited (derived from sequence of states, and counting the distinct vertices / sites visited) → Percept Space / Percept Function: Fully observable → State Space / Identity Function

2.4 - Exercise 1.4

A poker bot is a program that automatically plays poker on the internet. Poker bots are software agents that typically use AI techniques to attempt to beat human poker platers. Think about how to design a poker bot for the version of poker called `Texas hold 'em`, with the following rules → Every player is dealt two cards, for their eyes only → The dealer spreads five cards face up for all to see in `three stages` (i) three at once (ii) a single card (iii) another single card → Before and after the card/s in each stage are revealed, players take turns to bet → The best poker hand wins the pot (all the bets) What complications arise when a poker bot tries to play against more than one poker player?

State Space: All combinations of:

Player's current hand/cards, opponent's current hand/cards, current dealer cards
Current and previous bets placed by bother players
Number of chips held by booth players and in current pot

Action Space $A = \{Check, Call, Raise, Fold\}$ Assumes players always raise by the same amount, otherwise 1 action for each raising amount

World Dynamics

Number of chips in pot changes based on player bets
Chips are awarded to the winner of the round
Cards are dealt again at the end of the round/start of the next round
Other player's actions (needs to be sophisticated enough to describe the actions that the other players might take)

Utility Function

Number of poker chips in hand at the end of all the rounds

Percept Space / Percept Function

Cards in the player's own hand are revealed + some of the common cards depending on the current state
Cards not dealt, cards in opponent's hand are not part of the percept.
Partially observable.
Same percept, but underlying state is different

Complications when there are multiple opponents
Size of state space grows exponentially as number of players increases
Modelling opponent behaviour becomes very complicated.