Back to Lecture 10

Joint Distributions & Probability Operations

Working with Multiple Random Variables

Why Joint Distributions Matter

Real-World Problems Involve Multiple Variables

In AI, we rarely deal with single variables in isolation. We need to reason about:

  • 🌤️ Weather & Traffic: P(Traffic, Weather) - How do they relate?
  • 🏥 Symptoms & Diseases: P(Fever, Cough, Disease) - Multiple evidence
  • 🤖 Sensor Readings: P(Sensor1, Sensor2, Location) - Sensor fusion
📊
Joint Distribution

Complete probability model over multiple variables

Marginalization

Extract probability of one variable by "summing out" others

🔍
Conditioning

Focus on specific values: "given evidence"

Joint Probability Tables

Definition

A joint probability distribution specifies the probability of every possible combination of values for multiple variables.

P(X, Y)

"Probability of X AND Y occurring together"

Example: Weather & Traffic

Scenario: We observe weather (Sunny/Rainy) and traffic (Light/Heavy) in Riyadh. Here's the complete joint distribution:

Weather Traffic
🚗 Light 🚙 Heavy
☀️ Sunny 0.30 0.50
🌧️ Rainy 0.15 0.05

Reading the table: P(Sunny, Light) = 0.30 means 30% of days are both sunny AND have light traffic. All entries must sum to 1.0 (0.30 + 0.50 + 0.15 + 0.05 = 1.00) ✓

Key Properties
  • Complete representation: Contains all information about the variables
  • Normalization: All probabilities sum to 1
  • Non-negative: Each entry is between 0 and 1
  • Size: For n binary variables, table has 2n entries

Marginalization: Summing Out Variables

What is Marginalization?

Marginalization extracts the probability distribution of one variable by summing over all possible values of other variables.

P(X) = ∑y P(X, y)

"Sum over all possible values of Y"

Example: Marginalizing Traffic

Goal: Find P(Weather) by summing out Traffic from the joint distribution

Joint Distribution P(Weather, Traffic)
Light Heavy
☀️ Sunny 0.30 0.50
🌧️ Rainy 0.15 0.05
Marginal Distribution P(Weather)

P(Sunny) = ?

= P(Sunny, Light) + P(Sunny, Heavy)
= 0.30 + 0.50 = 0.80

P(Rainy) = ?

= P(Rainy, Light) + P(Rainy, Heavy)
= 0.15 + 0.05 = 0.20

✓ Result: P(Sunny) = 0.80, P(Rainy) = 0.20
Interpretation: 80% of days are sunny, 20% are rainy (regardless of traffic)

Interactive: Marginalization Visualizer

Adjust the joint distribution and see marginals update automatically

Marginal: P(Weather)
Marginal: P(Traffic)

Conditioning: Focusing on Evidence

What is Conditioning?

Conditioning restricts the distribution to cases where some variable has a specific value. It uses the conditional probability formula:

P(X | Y=y) =
P(X, Y=y)
P(Y=y)

"Distribution of X given that Y = y"

Example: Traffic Given Sunny Weather

Goal: Find P(Traffic | Sunny) - distribution of traffic on sunny days

Step 1: Extract Relevant Joint Probabilities

From the joint table, get all entries where Weather = Sunny:

  • P(Sunny, Light) = 0.30
  • P(Sunny, Heavy) = 0.50
Step 2: Get Marginal P(Sunny)

Sum over traffic to get marginal:

P(Sunny) = 0.30 + 0.50 = 0.80
Step 3: Normalize (Divide by Marginal)

P(Light | Sunny) = ?

=
0.30
0.80
= 0.375

P(Heavy | Sunny) = ?

=
0.50
0.80
= 0.625

✓ Result: On sunny days, 37.5% have light traffic and 62.5% have heavy traffic. Note: 0.375 + 0.625 = 1.0 (conditional distribution sums to 1)

Marginalization vs Conditioning

Marginalization:

  • "Ignore" a variable by summing it out
  • Result: Marginal distribution P(X)
  • Always sums to 1

Conditioning:

  • "Focus" on specific value of a variable
  • Result: Conditional distribution P(X|Y=y)
  • Normalized to sum to 1

Chain Rule & Product Rule

Breaking Down Joint Distributions

The chain rule (also called product rule) decomposes joint probabilities into products of conditional probabilities.

P(X, Y) = P(X | Y) × P(Y)

or equivalently

P(X, Y) = P(Y | X) × P(X)
Two Variables

Example: Calculate P(Sunny, Heavy) using the chain rule

Method 1: P(X, Y) = P(X|Y) × P(Y)

Using: P(Sunny | Heavy) × P(Heavy)

  • P(Heavy) = 0.50 + 0.05 = 0.55
  • P(Sunny | Heavy) = 0.50 / 0.55 = 0.909
= 0.909 × 0.55 = 0.50
Method 2: P(X, Y) = P(Y|X) × P(X)

Using: P(Heavy | Sunny) × P(Sunny)

  • P(Sunny) = 0.30 + 0.50 = 0.80
  • P(Heavy | Sunny) = 0.50 / 0.80 = 0.625
= 0.625 × 0.80 = 0.50

Both methods give the same answer! The chain rule is symmetric - you can condition in either order.

Multiple Variables (Generalization)

The chain rule generalizes to any number of variables:

P(X1, X2, ..., Xn) = P(X1) × P(X2|X1) × P(X3|X1,X2) × ...

Example: Three variables (Weather, Traffic, Mood)

P(W, T, M) = P(W) × P(T|W) × P(M|W,T)

Read as: "Weather probability × Traffic given Weather × Mood given Weather and Traffic"

Chain Rule for 4 Variables

For random variables A, B, C, D, the chain rule states:

P(A, B, C, D)
= P(A) ; P(B|A) ; P(C|A,B) ; P(D|A,B,C)

This is the full expansion.

1️⃣ Order: A → B → C → D
P(A,B,C,D) =
P(A)P(B|A)P(C|A,B)P(D|A,B,C)
2️⃣ Order: D → C → B → A
P(A,B,C,D) =
P(D)P(C|D)P(B|C,D)P(A|B,C,D)
3️⃣ Order: C → A → D → B
P(A,B,C,D) =
P(C)P(A|C)P(D|A,C)P(B|A,C,D)

Key insight: You can reorder the variables in any way, as long as every variable is conditioned on all previous ones. Any order is correct!

Why This Works

The chain rule breaks a large joint probability into smaller conditional probabilities.

  • Instead of one huge P(A,B,C,D)
  • We get manageable pieces
  • Each piece is easier to estimate
  • Based on real-world relationships
General form:
P(X₁,X₂,X₃,X₄) =
P(X₁);P(X₂|X₁);P(X₃|X₁,X₂);P(X₄|X₁,X₂,X₃)

Any order works, as long as each variable is conditioned on all earlier ones.

Why this matters: Chain rule is fundamental to Bayesian networks! It shows how complex joint distributions can be built from simpler conditional distributions. This is why BNs can handle hundreds of variables efficiently!

Key Takeaways

Essential Operations
  1. Joint Distribution: P(X,Y) - complete model
  2. Marginalization: Sum out variables to get P(X)
  3. Conditioning: Focus on evidence P(X|Y=y)
  4. Chain Rule: P(X,Y) = P(X|Y) × P(Y)
  5. All entries sum to 1.0
Why This Powers AI
  • Foundation of probabilistic reasoning
  • Enables multi-variable inference
  • Core of Bayesian networks
  • Used in sensor fusion, decision-making
  • Handles partial observations elegantly
The Complete Picture

Joint distributions + Marginalization + Conditioning + Chain Rule = Complete probabilistic reasoning toolkit
These operations let AI systems reason about complex, uncertain worlds with multiple interacting variables!

Next: Now you understand probability operations! Continue exploring Bayesian networks and probabilistic graphical models. Back to Lecture 10 →