How Value Iteration Works
Initialize
Set $V^{(0)}(s) = 0$ for all states
Compute Q-values
For each action, compute expected value
Take Maximum
$V^{(k+1)}(s) = \max_a Q(s,a)$
Repeat
Until values converge
Goal State
Reward: +10
Obstacle
Reward: -10
Movement
Cost: -0.1
Iteration Progress
Current iteration number
Max Value Change
Convergence threshold: $\varepsilon = 0.01$
What's Happening
Before Starting:
All values are initialized to 0. Click "Step" to update values using the Bellman equation, or "Auto Run" to watch automatic convergence.
Algorithm Step
$V_{new}(s) \leftarrow \max_a Q(s,a)$
$\pi(s) \leftarrow \arg\max_a Q(s,a)$