Compare Policy Evaluation vs Value Iteration side-by-side!
Run both algorithms on the same MDP and compare performance
Policy Evaluation
Input: Fixed policy π
Output: Vπ(s) for all s
Use case: Evaluate given policy
Output: Vπ(s) for all s
Use case: Evaluate given policy
Value Iteration
Input: MDP dynamics
Output: V*(s) and π*
Use case: Find optimal policy
Output: V*(s) and π*
Use case: Find optimal policy
Policy Iteration
Input: MDP dynamics
Output: π* (converges fast)
Use case: When policy stabilizes early
Output: π* (converges fast)
Use case: When policy stabilizes early
Iterations to Converge
Comparison Results
Run algorithms to see comparison...
Key Differences
Policy Evaluation: Fixed policy, computes Vπ
Value Iteration: Implicit policy, computes V*
Policy Iteration: Explicit policy update, often faster
Value Iteration: Implicit policy, computes V*
Policy Iteration: Explicit policy update, often faster
Complexity:
Both O(|S|²|A|) per iteration
Policy iteration typically needs fewer iterations
Both O(|S|²|A|) per iteration
Policy iteration typically needs fewer iterations