L12 - Reasoning About Other Agents

In order to reason about other agents, we need to have some reasoning about what the other agent(s) are thinking about

0.1 - Strategic Uncertainty

1.0 - Single Agents vs Multiple Agents

1.2 - Game Theory

🧠 Key Idea: Game Theory provides a model for reasoning over other agents' action and behaviours in AI, and is considered a foundation of multi-agent system design and analysis.

1.2.1 - Game Theory: Normal v Extensive Form

1.2.2 - Sharing Game with Perfect Information

🧠 Extensive form of a sharing game with perfect information

  1. Andy observes the initial state of the game and chooses to either keep, share or give the reward.
  2. Barb chooses whether the action will be performed.

1.2.3 - Sharing Game with Imperfect Information

1.3 - Multi-Agent Framework

2.0 - Minimax in Zero-Sum Games of Perfect Information

2.1 - Zero-Sum Games: Adversarial World

2.2 - The Zero-Sum Perfect Information Game Environment

2.4 - Online Search for Games?

🧠 We try to find intermediate states in the game that 'look good'?

2.4.1 - Choosing an Action: Basic Idea of Minimax Algorithm

  1. Using the current state as the initial state, build the game tree to the maximal depth h (called decision horizon) feasible within a computational time limit
  2. Evaluate the states of the leaf nodes
  3. Back-up the results from the leaves to the root
  4. Select the move toward a MIN node that has the largest backed-up value (from the leaf nodes)

2.5 - Evaluation Function


2.5.1 - Evaluation Function - Tic-Tac-Toe

e(s)=# of rows, cols and diagonals where MAX can win# of rows, cols and diagonals where MIN can win e(s)=\text{\# of rows, cols and diagonals where MAX can win} - \\\text{\# of rows, cols and diagonals where MIN can win}

2.6 - Minimax Algorithm

  1. Expand the game tree from the current state (where it is MAX's turn to play) to the depth h
  2. Compute the evaluation function at every leaf of the tree
  3. Back up the results from the leaves to the root node and pick the best action assuming the worst from MIN
  4. Select the move toward a MIN node that has the largest backed-up value.

2.6.1 - Minimax for Game Playing

2.7 - Alpha-Beta Pruning

🧠 A heuristic that works in practice, but doesn't provide any guarantees on better worst-case performance.

3.0 - Equilibrium in General-Sum Games

🧠 Focusing on Normal-Form Games

3.1 - Simultaneous Moves and Normal Form Games

4.0 - Strategic Dominance

4.1 - Iterative Elimination of Weakly-Dominated Strategies

🧠 Iterative Elimination of Weakly-Dominated Strategies can be applied to any game

4.2 - Dominant Strategy Equilibrium

🧠 This occurs when each agent has a dominant strategy - so each agent plays a dominant strategy.

5.0 - Nash Equilibrium

ui(σi,σi)ui(σi,σi) u_i(\sigma_i, \sigma_{-i})\ge u_i(\sigma^{'}_i,\sigma_{-i})

i(σ)ui(σi,σi)0    σiΔ(Ai),iI _i(\sigma^*)-u_i(\sigma_i,\sigma^*_{-i})\ge0\ \ \ \ \forall\sigma_i\in\Delta(A_i), \forall i \in I

5.1 - Nash Equilibrium - Key Results

  1. One of the great results of game theory (proven by Nash in 1950) is that every (normal form) finite game has at least one Nash equilibrium
  2. A second great result (in the same paper) regards symmetric games - These are games in which all players have the same actions and symmetric payoffs given each individual's action. Every finite symmetric game has a symmetric Nash equilibrium, in which actions are played with the same probability by all players as in scissors, paper, rock
  3. A zero sum game is a game in which payoffs for all players in each outcome sum to zero. Another useful result of game theory is that in every finite two-player zero-sum game, every Nash equilibrium is equivalent to a mixed-strategy minimax outcome.
  4. From (2) and (3), in the equilibrium of a two-player, symmetric, zero-sum game, each player must receive a payoff of 0, and the two-player symmetric, zero-sum games always have equilibria in symmetric strategies

Note: Any constant-sum game can be normalised to make it equivalent to a zero-sum game.\

5.2 - Computing Nash Equilibrium

5.2.1 - Goalkeeper Example (Again)

P(goal  right)=P(goal | left)=0.9pk+0.2(1pk)=0.3pk+0.6(1pk) P(\text{goal } | \text{ right})=P(\text{goal | left})=0.9p_k+0.2(1-p_k)=0.3p_k+0.6(1-p_k)

🧠 At what probability of pgp_g is the kicker indifferent to kicking left or right? i.e. induces the kicker to randomise?

0.2pg+0.6(1pg)=0.9pg+0.3(1pg)pg=0.3 0.2p_g+0.6(1-p_g)=0.9p_g+0.3(1-p_g)\\p_g=0.3

5.3 - Morra

🧠 In the game of Morra, each player shows either one or two fingers and announces a number between 2 and 4. If a player’s number is equal to the sum of the number of fingers shown, then her opponent must pay her that many dollars. The payoff is the net transfer, so that both players earn zero if both or neither guess the correct number of fingers shown. In this game, each player has 6 strategies:

  1. They may show one finger and guess 2;
  2. They may show one finger and guess 3;
  3. They may show one finger and guess 4;
  4. They may show two fingers and guess one of the three numbers (x3)

There are two weakly dominated strategies in Morra - what are they?

  1. It never pays to put out one finger and guess that the total number of fingers will be 4, because the other player cannot put out more than two fingers.
  2. Likewise, it never pays to put out two fingers and guess that the sum will be 2, because the other player must put down at least one finger.

Imagine that player A can read player B's mind and guess how he plays before he makes his move. What pure strategy should player B use?

Player B consults a textbook and decides to use randomisation to improve his performance in Morra. Ideally, if he can find the best mixed strategy to play, what would be his expected payoff?

One possible mixed strategy is to play show one finger and call “three” with probability 0.6, and to show two fingers and call “three” with probability 0.4 (and play the other strategies with probability 0). Is this a Nash equilibrium strategy? Assume that Player B is risk neutral with respect to the game payoffs