Back to Lecture 10

Independence & Conditional Independence

Simplifying Complex Probabilistic Models

Why Independence Matters in AI

The Curse of Dimensionality

Problem: Joint probability tables grow exponentially!

  • 2 binary variables → 2² = 4 entries
  • 10 binary variables → 2¹⁰ = 1,024 entries
  • 20 binary variables → 2²⁰ = 1,048,576 entries 💥
  • 100 binary variables → 2¹⁰⁰ = more atoms than in the universe! 🌌

Solution: Independence assumptions drastically reduce complexity!

🔓
Independence

Variables don't influence each other

🔐
Conditional Independence

Independent given some evidence

🔀
d-Separation

Graph-based independence testing

Independence

Definition

Two random variables X and Y are independent if knowing the value of one provides no information about the other.

P(X | Y) = P(X)

equivalently

P(X, Y) = P(X) × P(Y)

Notation: X ⊥ Y (X is independent of Y)

Examples of Independent Events
🪙 Example 1: Coin Flips

Scenario: Flip two coins

  • P(Coin2=Heads) = 0.5
  • P(Coin2=Heads | Coin1=Heads) = 0.5

✓ Independent!
Coin1 result doesn't affect Coin2

P(H₁, H₂) = P(H₁) × P(H₂)
= 0.5 × 0.5 = 0.25
🎲 Example 2: Dice Rolls

Scenario: Roll two dice

  • P(Die2=6) = 1/6
  • P(Die2=6 | Die1=6) = 1/6

✓ Independent!
Die1 doesn't affect Die2

P(6, 6) = P(6) × P(6)
= 1/6 × 1/6 = 1/36
❌ Counter-Example: Weather & Traffic

Scenario: Rainy day affects traffic

  • P(Heavy Traffic) = 0.40
  • P(Heavy Traffic | Rainy) = 0.70

✗ NOT Independent!
Rain increases traffic probability

P(Heavy, Rainy) ≠ P(Heavy) × P(Rainy)
❌ Counter-Example: Smoke & Fire

Scenario: Smoke indicates fire

  • P(Fire) = 0.001
  • P(Fire | Smoke) = 0.85

✗ NOT Independent!
Smoke strongly suggests fire

P(Fire, Smoke) ≠ P(Fire) × P(Smoke)
Parameter Reduction

Without independence:

  • 2 binary variables: Need 3 parameters (4 entries - 1 for normalization)
  • 10 binary variables: Need 1,023 parameters

With full independence:

  • 2 binary variables: Need 2 parameters (1 per variable)
  • 10 binary variables: Need only 10 parameters! 🎉

Conditional Independence

Definition

X and Y are conditionally independent given Z if knowing Z makes them independent.

P(X | Y, Z) = P(X | Z)

equivalently

P(X, Y | Z) = P(X | Z) × P(Y | Z)

Notation: X ⊥ Y | Z (X independent of Y given Z)

Classic Example: Fire Alarm Network
🔥
Fire
🍳
Cooking
⬇️
🚨
Alarm
⬇️
📞
Call
Scenario Analysis

Setup: Fire and Cooking both can trigger alarm. Alarm triggers neighbor call.

Question 1: Are Fire and Cooking independent?

✓ Yes! Fire ⊥ Cooking
Whether there's a fire doesn't depend on whether you're cooking (they're separate causes)

Question 2: Are Fire and Cooking independent given Alarm is ON?

✗ No! Fire ⊥̸ Cooking | Alarm
If alarm rings and we learn it's cooking, fire becomes less likely (explaining away!)

Question 3: Are Fire and Call independent given Alarm?

✓ Yes! Fire ⊥ Call | Alarm
Once we know alarm state, fire doesn't tell us more about call (alarm screens off fire)

Key Insight: Explaining Away

Fire and Cooking are independent causes, BUT become dependent when we observe their common effect (Alarm). This is called "explaining away" - if one cause explains the effect, the other becomes less likely!

Medical Example: Symptoms & Disease
🦠
Disease
⬇️
🤒
Fever
😷
Cough

Setup: Disease causes both Fever and Cough independently.

Without knowing disease status:

Fever and Cough are dependent
If you have fever, cough is more likely (both suggest disease)

Given we know disease status:

Fever ⊥ Cough | Disease
Once we know disease state, fever doesn't tell us about cough (they're independent symptoms)

d-Separation: Graph-Based Independence

What is d-Separation?

d-Separation (directed separation) is a graphical method to determine conditional independence in Bayesian networks by analyzing graph structure. If nodes X and Y are d-separated by Z, then X ⊥ Y | Z.

Three Basic Structures
1️⃣ Chain: X → Z → Y
X Z Y

Example: Weather → Traffic → Mood

X ⊥̸ Y (dependent)
Weather affects mood through traffic

X ⊥ Y | Z (independent given Z)
Given traffic, weather doesn't affect mood

2️⃣ Common Cause (Fork): X ← Z → Y
Z
X
Y

Example: Season → Temperature, Season → Flowers

X ⊥̸ Y (dependent)
Temperature and Flowers correlated (common cause: season)

X ⊥ Y | Z (independent given Z)
Given season, temperature and flowers independent

3️⃣ V-Structure (Common Effect): X → Z ← Y
X
Y
Z

Example: Fire → Alarm ← Cooking

X ⊥ Y (independent)
Fire and Cooking are independent causes

X ⊥̸ Y | Z (dependent given Z!)
Given alarm, they become dependent (explaining away)

⚠️ Special! V-structure is opposite: conditioning creates dependence!

d-Separation Rules (Simplified)

X and Y are d-separated by Z (i.e., X ⊥ Y | Z) if all paths between X and Y are "blocked" by Z:

  • Chain/Fork: Path blocked if middle node Z is observed (conditioned on)
  • V-structure: Path blocked if middle node Z is NOT observed (opposite!)
  • To test independence: Check if all paths are blocked

How Independence Reduces Parameters

Comparison: With vs Without Independence
❌ Without Independence Assumptions

Full Joint Distribution

Variables Parameters
2 binary 2² - 1 = 3
5 binary 2⁵ - 1 = 31
10 binary 2¹⁰ - 1 = 1,023
20 binary 2²⁰ - 1 = 1,048,575 💥

Problem: Exponential explosion!

✓ With Full Independence

Product of Marginals

Variables Parameters
2 binary 2 × 1 = 2
5 binary 5 × 1 = 5
10 binary 10 × 1 = 10
20 binary 20 × 1 = 20 🎉

Solution: Linear growth!

Realistic Example: Bayesian Network

Scenario: Medical diagnosis network with 5 binary variables

🦠 Disease
⬇️
🤒 Fever
😷 Cough
⬇️
🩺 Test1
💉 Test2

Parameter Count:

Full Joint (no structure):

2⁵ - 1 = 31 parameters

Bayesian Network (with structure):

1 + 2 + 2 + 2 + 2 = 9 parameters

✓ Result: Using conditional independence, we reduced from 31 to 9 parameters - 71% reduction!

Why This Matters for AI
  • Scalability: Can model large systems that would be impossible otherwise
  • Learning: Fewer parameters → need less training data
  • Inference: Faster computation with factored representations
  • Interpretability: Structure reveals causal relationships
  • Real AI systems: Spam filters, medical diagnosis, speech recognition all use these ideas

Key Takeaways

Core Concepts
  1. Independence: X ⊥ Y means P(X|Y) = P(X)
  2. Conditional Independence: X ⊥ Y | Z more common in practice
  3. d-Separation: Graph structure reveals independence
  4. Three structures: Chain, Fork, V-structure
  5. V-structure is special: Conditioning creates dependence
The Big Picture
  • Independence assumptions are NECESSARY for scalability
  • Without them: exponential explosion
  • With them: linear or polynomial growth
  • Bayesian networks encode independence through structure
  • This enables modern probabilistic AI!
From Theory to Practice

Independence assumptions transform intractable problems into solvable ones.
Every successful probabilistic AI system (spam filters, medical diagnosis, speech recognition, robotics) relies on carefully chosen independence assumptions to make computation feasible!

Next: You now understand the foundation of Bayesian networks! These concepts enable all of probabilistic AI. Back to Lecture 10 →