Transition Probabilities

What are Transition Probabilities?

Transition probabilities T(s, a, s') specify the probability of ending up in state s' if we take action a from state s. This is the core of stochasticity in MDPs!

Key Properties:

Probability distribution: 0 ≤ T(s, a, s') ≤ 1
Must sum to 1: Σ_s'∈S T(s, a, s') = 1
Markov property: Future depends only on current state
Successors: States with T(s, a, s') > 0

Notation:

T(s, a, s') = Probability of transition
s = Current state
a = Action taken
s' = Next state (successor)
Succ(s, a) = {s' : T(s, a, s') > 0}

Demo 1: Transition Matrix Visualizer 📊

📊 Visualize T(s, a, s') as a heatmap!

Click cells to explore transitions | Colors show probability strength

Select State (s):

Select Action (a):

T(s, a, s') →	A	B	C	Goal	Σ Total

Distribution

Selected Transition

Click a cell to see details...

Successors

Succ(s, a) = {s' : T(s, a, s') > 0}

Edit Transition Probabilities

T(s, a, A):

T(s, a, B):

T(s, a, C):

T(s, a, Goal):

Demo 2: Transportation Network 🚊

🚊 Magic trams with reliability issues!

Weather affects transition probabilities

Weather Condition

Sunny (High reliability) Cloudy (Medium) Rainy (Low)

Route: Home → University

Demo 3: Markov Property Explorer 🎯

🎯 Why "Markov"? Understanding history-independence

Compare Markov vs Non-Markov transitions

Markov Property ✓

P(s_t+1 | s_t, a_t) = P(s_t+1 | s_t, a_t, s_t-1, ..., s₀)

Example: Weather Model

Current: ☀️ Sunny
Action: Wait 1 day

P(Tomorrow = Rainy | Today = Sunny) = 0.2

History doesn't matter! Whether yesterday was rainy or sunny, the probability only depends on TODAY being sunny.

Paths to Sunny:

☁️ Cloudy → ☀️ Sunny → ⛅ Tomorrow: P = 0.2
🌧️ Rainy → ☀️ Sunny → ⛅ Tomorrow: P = 0.2
☀️ Sunny → ☀️ Sunny → ⛅ Tomorrow: P = 0.2

Same probability!

Non-Markov ✗

P depends on history, not just current state

Example: Mood Model

Current: 😊 Happy
Action: Attend meeting

P(After = Sad | Now = Happy, Yesterday = Sad) = 0.7
P(After = Sad | Now = Happy, Yesterday = Happy) = 0.2

History matters! Even though current state is "Happy", the transition probability depends on yesterday's mood.

Paths to Happy:

😢 Sad → 😊 Happy → 😢 After: P = 0.7
😊 Happy → 😊 Happy → 😢 After: P = 0.2

Different probabilities!

Why MDPs Use Markov Property
                                Benefits:
                                Compact state representation
Efficient computation
Well-defined transition probabilities
Tractable algorithms (value iteration, etc.)

                            

                                If history matters:
                                Encode history into state: s = (current, previous)
State space grows (exponentially in history length)
Trade-off: expressiveness vs complexity