Transition Probabilities

Understanding T(s, a, s') and the Markov Property

What are Transition Probabilities?

Transition probabilities T(s, a, s') specify the probability of ending up in state s' if we take action a from state s. This is the core of stochasticity in MDPs!

Key Properties:
  • Probability distribution: 0 ≤ T(s, a, s') ≤ 1
  • Must sum to 1: Σs'∈S T(s, a, s') = 1
  • Markov property: Future depends only on current state
  • Successors: States with T(s, a, s') > 0
Notation:
T(s, a, s') = Probability of transition
s = Current state
a = Action taken
s' = Next state (successor)
Succ(s, a) = {s' : T(s, a, s') > 0}

Demo 1: Transition Matrix Visualizer 📊

📊 Visualize T(s, a, s') as a heatmap!
Click cells to explore transitions | Colors show probability strength
T(s, a, s') → A B C Goal Σ Total
Distribution
Selected Transition
Click a cell to see details...
Successors
Succ(s, a) = {s' : T(s, a, s') > 0}
Edit Transition Probabilities

Demo 2: Transportation Network 🚊

🚊 Magic trams with reliability issues!
Weather affects transition probabilities
Weather Condition
Route: Home → University
Transition Probabilities

Demo 3: Markov Property Explorer 🎯

🎯 Why "Markov"? Understanding history-independence
Compare Markov vs Non-Markov transitions
Markov Property ✓

P(st+1 | st, at) = P(st+1 | st, at, st-1, ..., s0)

Example: Weather Model
Current: ☀️ Sunny
Action: Wait 1 day

P(Tomorrow = Rainy | Today = Sunny) = 0.2
History doesn't matter! Whether yesterday was rainy or sunny, the probability only depends on TODAY being sunny.
Paths to Sunny:
☁️ Cloudy → ☀️ Sunny → ⛅ Tomorrow: P = 0.2
🌧️ Rainy → ☀️ Sunny → ⛅ Tomorrow: P = 0.2
☀️ Sunny → ☀️ Sunny → ⛅ Tomorrow: P = 0.2
Same probability!
Non-Markov ✗

P depends on history, not just current state

Example: Mood Model
Current: 😊 Happy
Action: Attend meeting

P(After = Sad | Now = Happy, Yesterday = Sad) = 0.7
P(After = Sad | Now = Happy, Yesterday = Happy) = 0.2
History matters! Even though current state is "Happy", the transition probability depends on yesterday's mood.
Paths to Happy:
😢 Sad → 😊 Happy → 😢 After: P = 0.7
😊 Happy → 😊 Happy → 😢 After: P = 0.2
Different probabilities!
Why MDPs Use Markov Property
Benefits:
  • Compact state representation
  • Efficient computation
  • Well-defined transition probabilities
  • Tractable algorithms (value iteration, etc.)
If history matters:
  • Encode history into state: s = (current, previous)
  • State space grows (exponentially in history length)
  • Trade-off: expressiveness vs complexity