Motivation & Introduction | Lecture 12

Desert Navigation 🏜️

Navigate 3×4 grid | ⚠️ Desert winds = random direction!

🧍 Start (1,1) | 🔥 Heat -50 | 🏛️ Landmark +20 | 🌴 Oasis +2

Wind Probability

0.2

Iterations

Click "Run" to see optimal policy!

Wind = 0 (Deterministic)

Go for landmark! 🏛️

Example (1,1)→East:
V = 0 + 0.9×20 = 18

Wind = 0.2

Still worth the risk

Example (1,1)→East:
0.8×(0+0.9×20) + 0.2×(mixed)
≈ 14.4

Wind ≥ 0.4

Too risky! Go safe → oasis 🌴

Example (1,1)→East:
0.6×(0+0.9×20) + 0.4×(mixed)
≈ 10.8 (risky!)

Bellman Equation:

V(s) = max_a Σ P(s'|s,a)[R + γV(s')]

Key Insight

Algorithm finds optimal actions based on dynamics only! 🧠

Interactive Demo: Decision Under Uncertainty

Game Settings

Stay Reward:

Quit Reward:

Game ends on dice:

1 2 3 4 5 6

Click to toggle ending values

Expected Value

E[stay] = 120

E[quit] = 100

Optimal: STAY

Last Roll

🎲

Round

0

Dice

🎲

Round

0

Total

0

Click "Start" to begin!

Cumulative Rewards

MDP Model: States & Transitions

State Transition Diagram

Playing

Round r

STAY

+40

QUIT

+100

End

P = 0.33

Continue

P = 0.67

Game Evolution

Start playing to see evolution...

Understanding Rewards & Expected Utility 🎁

Select Policy:

Stay Quit

E[U] = 12

P(end): 0.33

Reward/Round:

Probability Distribution

Calculation

How It Works

Reward/round: +4

𝑃(end): 0.33

Total: Sum until termination

𝔼[𝑈]: Average across outcomes

Formula

𝔼[𝑈] = Σ_𝑖 𝑃(𝑖) × 𝑈(𝑖)

Average value across all outcomes

Geometric Distribution

For probability 𝑝 and reward 𝑟:

𝔼[𝑈] = 𝑟 / 𝑝

Average rounds = 1/𝑝

1. Why Deterministic Logic and Search Fail

The Deterministic World Assumption

Classical AI planning (Lectures 4-5) assumes:

Fully observable: Agent knows the current state exactly
Deterministic: Actions have predictable outcomes
Static: World doesn't change while agent is planning
Goal-driven: Clear success criteria

Works Well For:

Chess, puzzles (known states, deterministic)
Robot assembly (controlled environment)
Route planning (static maps)
Mathematical proofs

Fails For:

Medical diagnosis (uncertain symptoms)
Self-driving cars (unpredictable traffic)
Stock trading (stochastic markets)
Robot navigation (sensor noise)

Example: Robot Navigation

Consider a robot trying to navigate from Point A to Point B:

flowchart LR A[Start
Position A] -->|Move Forward| B{Actual
Outcome?} B -->|90% Success| C[Moved
Forward] B -->|5% Slip Left| D[Moved
Left] B -->|5% Slip Right| E[Moved
Right] C --> F[Goal?] D --> G[Wrong
Location] E --> G style A fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style B fill:#ffc107,color:white,stroke:#0a2540,stroke-width:2px style C fill:#00d4ff,color:white,stroke:#0a2540,stroke-width:2px style D fill:#dc3545,color:white,stroke:#0a2540,stroke-width:2px style E fill:#dc3545,color:white,stroke:#0a2540,stroke-width:2px style F fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style G fill:#dc3545,color:white,stroke:#0a2540,stroke-width:2px

The Problem: Deterministic planning says "move forward 10 times to reach goal." But with 90% success per move, probability of 10 perfect moves = 0.9¹⁰ ≈ 35%. The plan fails 65% of the time!

Key Insight

In the real world, uncertainty is ubiquitous. Actions have probabilistic outcomes, observations are noisy, and the environment is unpredictable. We need a framework that embraces uncertainty rather than ignoring it.

2. From Passive Inference to Active Decision-Making

Building on Lecture 11

Bayesian Networks (Lecture 11) taught us how to:

Represent uncertain knowledge
Update beliefs given evidence (inference)
Answer queries like P(Disease | Symptoms)

Inference (Lecture 11)

Question: What do I believe?

Example:

P(Flu | Fever, Cough) = ?
Update belief about disease
Passive observation

Decision Theory (Lecture 12)

Question: What should I do?

Example:

Should I treat for flu?
Or order more tests first?
Active decision-making

Medical Decision Example

flowchart TD S[Patient Symptoms
Fever, Cough] -->|Inference| B[Belief Update
P Flu = 0.7] B -->|Decision| D{What Action?} D -->|Option 1| T1[Prescribe
Antiviral] D -->|Option 2| T2[Order
Lab Test] D -->|Option 3| T3[Watchful
Waiting] T1 --> O1[Outcomes:
Recover / Side Effects / Costs] T2 --> O2[Outcomes:
More Info / Delay / Cost] T3 --> O3[Outcomes:
Natural Recovery / Worsening] style S fill:#635bff,color:white,stroke:#0a2540,stroke-width:2px style B fill:#00d4ff,color:white,stroke:#0a2540,stroke-width:2px style D fill:#ffc107,color:white,stroke:#0a2540,stroke-width:3px style T1 fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style T2 fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px style T3 fill:#32D583,color:white,stroke:#0a2540,stroke-width:2px

Inference tells us beliefs. Decision theory tells us actions.

The Bridge

Decision Theory = Probability Theory (beliefs) + Utility Theory (preferences) + Action Selection

3. Real-World Decision Scenarios

Medical Diagnosis & Treatment

Decision: Surgery vs. Medication vs. Wait

Uncertainty:

Diagnosis not 100% certain
Treatment success varies by patient
Side effects are probabilistic

Preferences: Health outcome, cost, risk tolerance, quality of life

Robot Motion Planning

Decision: Safe path vs. Fast path vs. Explore

Uncertainty:

Sensor noise (position, obstacles)
Actuator errors (slip, drift)
Unknown obstacles

Preferences: Time to goal, energy consumption, collision risk, mission success

Business Strategy

Decision: Launch Product vs. More R&D vs. Pivot

Uncertainty:

Market demand unknown
Competitor actions unpredictable
Development costs variable

Preferences: Profit, market share, long-term growth, risk exposure

Autonomous Vehicles

Decision: Lane change vs. Brake vs. Maintain speed

Uncertainty:

Other drivers' intentions
Road conditions (wet, icy)
Sensor limitations

Preferences: Safety (maximize), travel time, passenger comfort, legality

Common Theme

All these scenarios share three elements: (1) Uncertain outcomes · (2) Multiple action choices · (3) Trade-offs between competing preferences. Decision theory provides a unified framework for all of them.

Summary & Next Steps

What We Learned

Deterministic planning fails under uncertainty
Real-world agents must act, not just infer
Actions have uncertain outcomes
Agents have preferences (not all outcomes equal)
Need framework combining probability + preference

                            Coming Next
                            Topic 2: Core components (States, Actions, Utilities)
Topic 3: Utility theory and preferences
Topic 4: Expected utility maximization
Topic 5: Decision networks

                        

Motivation & Introduction to Decision Theory

Desert Navigation 🏜️

Interactive Demo: Decision Under Uncertainty

Game Settings

Expected Value

0

🎲

0

0

Cumulative Rewards

MDP Model: States & Transitions

State Transition Diagram

Game Evolution

Understanding Rewards & Expected Utility 🎁

1. Why Deterministic Logic and Search Fail

The Deterministic World Assumption

Example: Robot Navigation

Key Insight

2. From Passive Inference to Active Decision-Making

Building on Lecture 11

Medical Decision Example

The Bridge

3. Real-World Decision Scenarios

Common Theme

Summary & Next Steps

What We Learned

Coming Next