Motivation & Introduction to Decision Theory

Why We Need Rational Decision-Making Under Uncertainty

Desert Navigation 🏜️

Navigate 3Γ—4 grid | ⚠️ Desert winds = random direction!
🧍 Start (1,1) | πŸ”₯ Heat -50 | πŸ›οΈ Landmark +20 | 🌴 Oasis +2
Wind Probability
0.2
Iterations
Click "Run" to see optimal policy!
Wind = 0 (Deterministic)
Go for landmark! πŸ›οΈ
Example (1,1)β†’East:
V = 0 + 0.9Γ—20 = 18
Wind = 0.2
Still worth the risk
Example (1,1)β†’East:
0.8Γ—(0+0.9Γ—20) + 0.2Γ—(mixed)
β‰ˆ 14.4
Wind β‰₯ 0.4
Too risky! Go safe β†’ oasis 🌴
Example (1,1)β†’East:
0.6Γ—(0+0.9Γ—20) + 0.4Γ—(mixed)
β‰ˆ 10.8 (risky!)
Bellman Equation:
V(s) = maxa Ξ£ P(s'|s,a)[R + Ξ³V(s')]
Key Insight
Algorithm finds optimal actions based on dynamics only! 🧠

Interactive Demo: Decision Under Uncertainty

Game Settings
Click to toggle ending values
Expected Value
E[stay] = 120
E[quit] = 100
Optimal: STAY
Last Roll
🎲
Round

0

Dice

🎲

Round

0

Total

0

Click "Start" to begin!
Cumulative Rewards
MDP Model: States & Transitions
State Transition Diagram
Playing
Round r
STAY
+40
QUIT
+100
End
P = 0.33
Continue
P = 0.67
Game Evolution
Start playing to see evolution...

Understanding Rewards & Expected Utility 🎁

Select Policy:
E[U] = 12
P(end): 0.33
Probability Distribution
Calculation
How It Works
Reward/round: +4
𝑃(end): 0.33
Total: Sum until termination
𝔼[π‘ˆ]: Average across outcomes
Formula
𝔼[π‘ˆ] = Σ𝑖 𝑃(𝑖) Γ— π‘ˆ(𝑖)
Average value across all outcomes
Geometric Distribution
For probability 𝑝 and reward π‘Ÿ:
𝔼[π‘ˆ] = π‘Ÿ / 𝑝
Average rounds = 1/𝑝

1. Why Deterministic Logic and Search Fail

The Deterministic World Assumption

Classical AI planning (Lectures 4-5) assumes:

  • Fully observable: Agent knows the current state exactly
  • Deterministic: Actions have predictable outcomes
  • Static: World doesn't change while agent is planning
  • Goal-driven: Clear success criteria
Works Well For:
  • Chess, puzzles (known states, deterministic)
  • Robot assembly (controlled environment)
  • Route planning (static maps)
  • Mathematical proofs
Fails For:
  • Medical diagnosis (uncertain symptoms)
  • Self-driving cars (unpredictable traffic)
  • Stock trading (stochastic markets)
  • Robot navigation (sensor noise)
Example: Robot Navigation

Consider a robot trying to navigate from Point A to Point B:

flowchart LR A[Start
Position A] -->|Move Forward| B{Actual
Outcome?} B -->|90% Success| C[Moved
Forward] B -->|5% Slip Left| D[Moved
Left] B -->|5% Slip Right| E[Moved
Right] C --> F[Goal?] D --> G[Wrong
Location] E --> G style A fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style B fill:#ffc107,color:white,stroke:#0a2540,stroke-width:2px style C fill:#00d4ff,color:white,stroke:#0a2540,stroke-width:2px style D fill:#dc3545,color:white,stroke:#0a2540,stroke-width:2px style E fill:#dc3545,color:white,stroke:#0a2540,stroke-width:2px style F fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style G fill:#dc3545,color:white,stroke:#0a2540,stroke-width:2px

The Problem: Deterministic planning says "move forward 10 times to reach goal." But with 90% success per move, probability of 10 perfect moves = 0.9¹⁰ β‰ˆ 35%. The plan fails 65% of the time!

Key Insight

In the real world, uncertainty is ubiquitous. Actions have probabilistic outcomes, observations are noisy, and the environment is unpredictable. We need a framework that embraces uncertainty rather than ignoring it.

2. From Passive Inference to Active Decision-Making

Building on Lecture 11

Bayesian Networks (Lecture 11) taught us how to:

  • Represent uncertain knowledge
  • Update beliefs given evidence (inference)
  • Answer queries like P(Disease | Symptoms)
Inference (Lecture 11)

Question: What do I believe?

Example:

  • P(Flu | Fever, Cough) = ?
  • Update belief about disease
  • Passive observation
Decision Theory (Lecture 12)

Question: What should I do?

Example:

  • Should I treat for flu?
  • Or order more tests first?
  • Active decision-making
Medical Decision Example
flowchart TD S[Patient Symptoms
Fever, Cough] -->|Inference| B[Belief Update
P Flu = 0.7] B -->|Decision| D{What Action?} D -->|Option 1| T1[Prescribe
Antiviral] D -->|Option 2| T2[Order
Lab Test] D -->|Option 3| T3[Watchful
Waiting] T1 --> O1[Outcomes:
Recover / Side Effects / Costs] T2 --> O2[Outcomes:
More Info / Delay / Cost] T3 --> O3[Outcomes:
Natural Recovery / Worsening] style S fill:#635bff,color:white,stroke:#0a2540,stroke-width:2px style B fill:#00d4ff,color:white,stroke:#0a2540,stroke-width:2px style D fill:#ffc107,color:white,stroke:#0a2540,stroke-width:3px style T1 fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style T2 fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style T3 fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px

Inference tells us beliefs. Decision theory tells us actions.

The Bridge

Decision Theory = Probability Theory (beliefs) + Utility Theory (preferences) + Action Selection

3. Real-World Decision Scenarios

Medical Diagnosis & Treatment

Decision: Surgery vs. Medication vs. Wait

Uncertainty:

  • Diagnosis not 100% certain
  • Treatment success varies by patient
  • Side effects are probabilistic

Preferences: Health outcome, cost, risk tolerance, quality of life

Robot Motion Planning

Decision: Safe path vs. Fast path vs. Explore

Uncertainty:

  • Sensor noise (position, obstacles)
  • Actuator errors (slip, drift)
  • Unknown obstacles

Preferences: Time to goal, energy consumption, collision risk, mission success

Business Strategy

Decision: Launch Product vs. More R&D vs. Pivot

Uncertainty:

  • Market demand unknown
  • Competitor actions unpredictable
  • Development costs variable

Preferences: Profit, market share, long-term growth, risk exposure

Autonomous Vehicles

Decision: Lane change vs. Brake vs. Maintain speed

Uncertainty:

  • Other drivers' intentions
  • Road conditions (wet, icy)
  • Sensor limitations

Preferences: Safety (maximize), travel time, passenger comfort, legality

Common Theme

All these scenarios share three elements: (1) Uncertain outcomes Β· (2) Multiple action choices Β· (3) Trade-offs between competing preferences. Decision theory provides a unified framework for all of them.

Summary & Next Steps

What We Learned
  1. Deterministic planning fails under uncertainty
  2. Real-world agents must act, not just infer
  3. Actions have uncertain outcomes
  4. Agents have preferences (not all outcomes equal)
  5. Need framework combining probability + preference
Coming Next
  • Topic 2: Core components (States, Actions, Utilities)
  • Topic 3: Utility theory and preferences
  • Topic 4: Expected utility maximization
  • Topic 5: Decision networks