Introduction to Bayesian Networks | Lecture 11

1. Why Do We Need Bayesian Networks for Modeling Uncertainty?

Very Simple One-Sentence Explanation

A Bayesian network is a map that shows how things influence each other and helps us handle uncertainty in a structured way.

🌍 Real-World Example: Medical Diagnosis

Watch: Building a Medical Diagnosis Network

Step 0/6

Click "Next Step" to start building the medical diagnosis network.

Complexity Comparison

Without BN:
1 combinations

With BN:
0 parameters

Bayesian networks give us three major advantages:

1. Reduction of Complexity

Instead of calculating probabilities for hundreds or thousands of combined cases, we only compute the direct relationships between variables.

This leads to:
• Less computation
• Less memory usage
• Faster inference
• Simpler models

A Bayesian network breaks a huge problem into small local probability relationships.

2. Powerful Inference (Reasoning)

Bayesian networks allow us to reason in both directions:

Forward reasoning
→ Prediction
"If rain → P(wet grass)?"

Backward reasoning
← Diagnosis
"If wet grass → rain or sprinkler?"

This is very similar to how humans think when we explain or predict things.

3. Causal Modeling

Bayesian networks explicitly show:

• What causes what
• Why events happen
• What will change if we intervene

This makes them ideal for causal reasoning in uncertain environments.

Watch the Complexity Reduction!

Add Variables

5 variables

Traditional Approach

31

parameters needed

2⁵ - 1

Bayesian Network

10

parameters needed

1+1+4+2+2

Real Impact:

With 5 variables (Burglary-Alarm example), traditional needs 31 parameters

With 20 variables (medical diagnosis system), traditional needs over 1 million parameters!

Bayesian Networks with sparse structure: Just ~40-80 parameters! 🚀

🎯 Summary: Why We Use Bayesian Networks
                                
                                They Represent Relationships
                                Between events and variables
                            
                                Help Understand Causes
                                Behind observations
                            
                                Work with Incomplete Info
                                Perfect for uncertainty
                            
                                Support Prediction & Diagnosis
                                Forward and backward reasoning
                            
                                Make Decisions Systematic
                                Under uncertainty
                            
                                Strong Mathematical Foundation
                                Based on probability theory

2. What is a Bayesian Network?

Intuitive Definition:

A Bayesian Network is a graph that shows how variables depend on each other, along with small tables that describe those dependencies numerically.

Formal Definition:

A Bayesian Network is a pair (G, Θ) where:

G = Directed Acyclic Graph (DAG)
Θ = Set of Conditional Probability Tables (CPTs)

2.1. The Graph Structure (G)

Components:

Nodes (circles): Represent random variables
Directed edges (arrows): Show direct influence or dependency
Acyclic: No cycles allowed (can't loop back to earlier nodes)

DAG Represents Dependencies Between Events

The directed acyclic graph structure captures causal and probabilistic relationships between events:

📈 Direct Dependencies:
"Rain" → "Sprinkler" (Rain affects sprinkler decisions)

🔄 Joint Dependencies:
Multiple causes can jointly affect an outcome (Rain + Sprinkler → Grass Wet)

The graph structure tells us which events directly influence others, enabling efficient probability calculations.

Valid BN (Acyclic)

flowchart TD Rain[Rain] --> Sprinkler[Sprinkler] Rain --> Grass["Grass Wet"] Sprinkler --> Grass classDef valid fill:#32d583,color:white,stroke:#32d583,stroke-width:3px class Rain,Sprinkler,Grass valid

No cycles - valid DAG!

Invalid BN (Has Cycle)

flowchart TD A[A] --> B[B] B --> C[C] C --> A classDef invalid fill:#dc3545,color:white,stroke:#dc3545,stroke-width:3px class A,B,C invalid

Has cycle A→B→C→A - invalid!

2.2. Conditional Probability Tables (Θ)

What are CPTs?

Each variable has a Conditional Probability Table that specifies P(Variable | Parents) - the probability of each value given the parent values.

The CPT Equation:

θ_i = P(X_i | Parents(X_i))

Where:
• θ_i = The conditional probability table for variable X_i
• X_i = The i-th variable in our Bayesian network
• Parents(X_i) = All variables that directly influence X_i

What This Means:

Each CPT answers: "What are the probabilities of this variable's values, given specific values of its parent variables?"

Has Disease
No parents (root node)
θ₁ = P(Has Disease)
Question: What's the probability someone has COVID?
Answer: 10% (disease prevalence)

Fever
Parent: Has Disease
θ₂ = P(Fever | Disease)
Question: If someone has COVID, what's P(fever)?
Answer: 80% (COVID symptom)

Cough
Parent: Has Disease
θ₃ = P(Cough | Disease)
Question: If someone has COVID, what's P(cough)?
Answer: 90% (COVID symptom)

Test Result
Parents: Fever, Cough
θ₄ = P(Test Result | Fever, Cough)
Question: If fever AND cough present, what's P(positive test)?
Answer: 95% (high symptom correlation)

Intuition: Root nodes (no parents) are like prior beliefs or base rates. Nodes with parents are like conditional probabilities that combine multiple influences.

Key Point: The CPT size depends on parents: If a variable has k parents, each with d values, the CPT needs d^k entries (one for each parent combination).

Interactive CPT Explorer: Medical Diagnosis Network

Click on each node to see its conditional probability table:

Click on a node above to see its Conditional Probability Table

Key Constraint:

Each row in a CPT must sum to 1.0 (probabilities must be normalized).

2.3. From Problem to Bayesian Network

The Process:

Start with a real-world problem → identify variables → draw relationships → add probabilities

Watch: Medical Diagnosis Problem → Bayesian Network

Step 0/6

Click "Next Step" to begin building a Bayesian Network from a medical diagnosis problem.

Building BNs: Key Steps
                                            1 Understand the problem and domain
                                        
                                            2 List all variables (outcomes and factors)
                                        
                                            3 Identify causal links (what influences what)
                                        
                                            4 Draw the graph (arrows show dependencies)
                                        
                                            5 Specify CPTs from data or experts
                                        
                                            6 Validate & use for inference

3. The Magic: How BNs Represent Joint Distributions

The Fundamental Equation:

P(X₁, X₂, ..., X_n) = ∏_i=1ⁿ P(X_i | Parents(X_i))

Read as: The joint probability equals the product of all local conditional probabilities

Intuition:

Instead of storing one giant table with all combinations, we store many small tables (one per variable). The graph tells us how to multiply them together to get any probability we need.

Worked Example: Computing a Joint Probability

Question: What is P(Rain=T, Sprinkler=F, GrassWet=T)?

Step-by-Step Calculation:
                            P(R=T, S=F, G=T) = P(R=T) × P(S=F | R=T) × P(G=T | R=T, S=F)
                        
                                        Term
                                        From CPT
                                        Value
                                    
                                        P(Rain = T)
                                        Prior probability
                                        0.2
                                    
                                        P(Sprinkler = F | Rain = T)
                                        Sprinkler's CPT (rain row)
                                        0.9
                                    
                                        P(GrassWet = T | Rain = T, Sprinkler = F)
                                        GrassWet's CPT (R=T, S=F row)
                                        0.95
                                    
                                        Final Result (multiply all):
                                        0.2 × 0.9 × 0.95 = 0.171
                                    
                            Answer: P(Rain=T, Sprinkler=F, GrassWet=T) = 17.1%

Term	From CPT	Value
P(Rain = T)	Prior probability	0.2
P(Sprinkler = F \| Rain = T)	Sprinkler's CPT (rain row)	0.9
P(GrassWet = T \| Rain = T, Sprinkler = F)	GrassWet's CPT (R=T, S=F row)	0.95
Final Result (multiply all):	0.2 × 0.9 × 0.95 = 0.171

Why This Works:

The graph structure encodes conditional independence assumptions. Each variable only depends on its direct parents, not on all other variables. This is what makes the factorization valid and dramatically reduces complexity.

4. Visual Comparison: Full Joint vs. Bayesian Network

Full Joint Table

Storage: 2ⁿ entries

For 3 variables: 8 entries

For 10 variables: 1,024 entries

For 20 variables: 1,048,576 entries!

Bayesian Network

Storage: Local CPTs only

For 3 variables: ~6 entries

For 10 variables: ~30 entries

For 20 variables: ~60 entries!

The Winner: Bayesian Networks!

By exploiting the structure of dependencies, Bayesian Networks can represent complex probability distributions with exponentially fewer parameters than the full joint table. This makes them practical for real-world applications with dozens or even hundreds of variables!

5. Key Takeaways

What is a Bayesian Network?

A DAG (Directed Acyclic Graph)
Plus CPTs (one per node)
Compactly represents joint distributions
Encodes conditional independence

                            Why Use Bayesian Networks?
                            Avoids exponential explosion
Mirrors causal structure
Enables efficient inference
Human-interpretable representation

                        

Remember the Factorization:

P(X₁, ..., X_n) = ∏_i P(X_i | Parents(X_i))

Next Topic: We'll dive deeper into how this factorization works and explore the joint distribution factorization in detail.