Working with Multiple Random Variables
In AI, we rarely deal with single variables in isolation. We need to reason about:
Complete probability model over multiple variables
Extract probability of one variable by "summing out" others
Focus on specific values: "given evidence"
A joint probability distribution specifies the probability of every possible combination of values for multiple variables.
"Probability of X AND Y occurring together"
Scenario: We observe weather (Sunny/Rainy) and traffic (Light/Heavy) in Riyadh. Here's the complete joint distribution:
| Weather | Traffic | |
|---|---|---|
| 🚗 Light | 🚙 Heavy | |
| ☀️ Sunny | 0.30 | 0.50 |
| 🌧️ Rainy | 0.15 | 0.05 |
Reading the table: P(Sunny, Light) = 0.30 means 30% of days are both sunny AND have light traffic. All entries must sum to 1.0 (0.30 + 0.50 + 0.15 + 0.05 = 1.00) ✓
Marginalization extracts the probability distribution of one variable by summing over all possible values of other variables.
"Sum over all possible values of Y"
Goal: Find P(Weather) by summing out Traffic from the joint distribution
| Light | Heavy | |
|---|---|---|
| ☀️ Sunny | 0.30 | 0.50 |
| 🌧️ Rainy | 0.15 | 0.05 |
P(Sunny) = ?
P(Rainy) = ?
✓ Result: P(Sunny) = 0.80, P(Rainy) = 0.20
Interpretation: 80% of days are sunny, 20% are rainy (regardless of traffic)
Adjust the joint distribution and see marginals update automatically
Conditioning restricts the distribution to cases where some variable has a specific value. It uses the conditional probability formula:
"Distribution of X given that Y = y"
Goal: Find P(Traffic | Sunny) - distribution of traffic on sunny days
From the joint table, get all entries where Weather = Sunny:
Sum over traffic to get marginal:
P(Light | Sunny) = ?
P(Heavy | Sunny) = ?
✓ Result: On sunny days, 37.5% have light traffic and 62.5% have heavy traffic. Note: 0.375 + 0.625 = 1.0 (conditional distribution sums to 1)
Marginalization:
Conditioning:
The chain rule (also called product rule) decomposes joint probabilities into products of conditional probabilities.
or equivalently
Example: Calculate P(Sunny, Heavy) using the chain rule
Using: P(Sunny | Heavy) × P(Heavy)
Using: P(Heavy | Sunny) × P(Sunny)
Both methods give the same answer! The chain rule is symmetric - you can condition in either order.
The chain rule generalizes to any number of variables:
Example: Three variables (Weather, Traffic, Mood)
Read as: "Weather probability × Traffic given Weather × Mood given Weather and Traffic"
For random variables A, B, C, D, the chain rule states:
This is the full expansion.
Key insight: You can reorder the variables in any way, as long as every variable is conditioned on all previous ones. Any order is correct!
The chain rule breaks a large joint probability into smaller conditional probabilities.
Any order works, as long as each variable is conditioned on all earlier ones.
Why this matters: Chain rule is fundamental to Bayesian networks! It shows how complex joint distributions can be built from simpler conditional distributions. This is why BNs can handle hundreds of variables efficiently!
Joint distributions + Marginalization + Conditioning + Chain Rule = Complete probabilistic reasoning toolkit
These operations let AI systems reason about complex, uncertain worlds with multiple interacting variables!