π MDP State-Action Diagram
Visualizing the Dice Game as a Markov Decision Process
MDP Components
States: {in, end}
Actions: {stay, quit}
STAY Action:
β’ P(continue) = 2/3, R = $4
β’ P(end) = 1/3, R = $4
β’ P(end) = 1/3, R = $4
QUIT Action:
β’ P(end) = 1, R = $10
Expected Values:
β’ V(stay) = R / (1 - Ξ³Β·pCont)
β’ V(quit) = $10 (immediate)
β’ Optimal: STAY!
β’ V(quit) = $10 (immediate)
β’ Optimal: STAY!
Discount Factor Ξ³:
β’ Ξ³ = 1: Future = Present value
β’ Ξ³ < 1: Future rewards worth less
β’ Ξ³ β 0: Only immediate matters
β’ Ξ³ < 1: Future rewards worth less
β’ Ξ³ β 0: Only immediate matters
π² Play the Dice Game!
Experience the MDP in action - choose STAY or QUIT each round
Game Settings
0
1
1.00
Only now matters
Future = Present
Click to toggle ending values
Expected Value
V(stay) = 12
V(quit) = 10
Optimal: STAY
Last Roll
π²
Round
0
Dice
π²
Round $
0
Total $
0
Click "Start" to begin!
Raw Total
$0
Ξ£ rewards
Discounted Total
$0
Ξ£ Ξ³t Γ rt
Cumulative Rewards (with discount)
Game History
Start playing to see history...