Summary & Applications | Lecture 12

Course Summary

We've covered Markov Decision Processes (MDPs) - a powerful framework for sequential decision-making under uncertainty. Now let's compare approaches and see real-world applications!

Core Concepts:

States, Actions, Transitions, Rewards
Value Functions: V^π, V^*, Q
Bellman Equations
Policies: fixed vs optimal

Algorithms:

Policy Evaluation
Value Iteration
Policy Iteration
Model-based planning

Applications:

Robotics & Navigation
Healthcare & Treatment
Autonomous Vehicles
Reinforcement Learning

Demo 1: Algorithm Comparison Dashboard

Compare Policy Evaluation vs Value Iteration side-by-side!

Run both algorithms on the same MDP and compare performance

Policy Evaluation

Input: Fixed policy π
Output: V^π(s) for all s
Use case: Evaluate given policy

Value Iteration

Input: MDP dynamics
Output: V^*(s) and π^*
Use case: Find optimal policy

Policy Iteration

Input: MDP dynamics
Output: π^* (converges fast)
Use case: When policy stabilizes early

Iterations to Converge

Comparison Results

Run algorithms to see comparison...

Key Differences
                                Policy Evaluation: Fixed policy, computes Vπ

                                Value Iteration: Implicit policy, computes V*

                                Policy Iteration: Explicit policy update, often faster
                            
                                Complexity:

                                Both O(|S|²|A|) per iteration

                                Policy iteration typically needs fewer iterations

Demo 2: Autonomous Vehicle Lane Changing

Should the car change lanes? MDP decides!

Lane changing with traffic uncertainty

Traffic Scenario

States: {Left Lane, Center Lane, Right Lane}
Actions: {Stay, Change Left, Change Right}
Uncertainty: Other vehicles may block lane changes

Traffic Density

Medium

Optimal Policy

Click "Compute" to find optimal lane changes...

Rewards

Stay in lane: +1 (safe)
Successful change: +5 (faster)
Blocked change: -10 (dangerous)
Collision risk: -50 (critical)

Real-World Insight
                            Autonomous vehicles use MDPs (or POMDPs for partial observability) to make safe driving decisions.
                            The policy balances speed (changing to faster lane) with safety (risk of collision).
                            High traffic → more conservative policy!
                        

Demo 3: Medical Treatment Planning

Optimal treatment sequence for chronic disease!

Balance treatment efficacy vs side effects

Disease States

Healthy

Goal

↕️

Mild

Monitor

↕️

Severe

Urgent

Treatment Actions

No Treatment: Monitor only
Medication A: Effective, mild side effects
Medication B: Very effective, strong side effects
Surgery: High efficacy, high risk

Optimal Treatment Policy

Click to compute optimal treatment sequence...

Expected Outcomes

Clinical Decision Support

MDPs help doctors choose treatment plans that maximize patient quality-of-life considering treatment efficacy, side effects, and disease progression uncertainty.

Demo 4: Bridge to Reinforcement Learning

From MDPs (model-based) to RL (model-free)!

Preview Q-learning: learn optimal policy without knowing T or R

Model-Based (MDP)

Given: T(s,a,s'), R(s,a,s'), γ
Compute: V^*(s), π^*(s)
Methods: Value iteration, policy iteration
Pros: Guaranteed optimal, fast if model known
Cons: Need accurate model

Model-Free (RL)

Given: Ability to interact with environment
Learn: Q^*(s,a) from experience
Methods: Q-learning, SARSA, Actor-Critic
Pros: No model needed, learns from data
Cons: Slower, needs exploration

Q-Learning Algorithm

                                    Initialize: Q(s,a) = 0 for all s,a

                                    Repeat:

                                      Take action a, observe r, s'

                                      Q(s,a) ← Q(s,a) + α[r + γ·maxa'Q(s',a') - Q(s,a)]

                                    Policy: π(s) = argmaxa Q(s,a)

Learning Progress

Next Steps in AI
                                Deep RL:

                                Q-learning + Neural Networks

                                Learn from raw pixels/sensors
                            
                                Applications:

                                Game AI (AlphaGo, Dota2)

                                Robotics control

                                Resource optimization
                            
                                Advanced Topics:

                                POMDPs (partial observability)

                                Multi-agent RL

                                Transfer learning