Introduction to Bayesian Networks

From Exponential Complexity to Elegant Graphs

1. Why Do We Need Bayesian Networks for Modeling Uncertainty?

Very Simple One-Sentence Explanation
A Bayesian network is a map that shows how things influence each other and helps us handle uncertainty in a structured way.

🌍 Real-World Example: Medical Diagnosis

Watch: Building a Medical Diagnosis Network
Step 0/6

Click "Next Step" to start building the medical diagnosis network.

Complexity Comparison
Without BN:
1 combinations
With BN:
0 parameters
Bayesian networks give us three major advantages:
1. Reduction of Complexity

Instead of calculating probabilities for hundreds or thousands of combined cases, we only compute the direct relationships between variables.

This leads to:
β€’ Less computation
β€’ Less memory usage
β€’ Faster inference
β€’ Simpler models
A Bayesian network breaks a huge problem into small local probability relationships.
2. Powerful Inference (Reasoning)

Bayesian networks allow us to reason in both directions:

Forward reasoning
β†’ Prediction
"If rain β†’ P(wet grass)?"
Backward reasoning
← Diagnosis
"If wet grass β†’ rain or sprinkler?"
This is very similar to how humans think when we explain or predict things.
3. Causal Modeling

Bayesian networks explicitly show:

β€’ What causes what
β€’ Why events happen
β€’ What will change if we intervene
This makes them ideal for causal reasoning in uncertain environments.
Watch the Complexity Reduction!

Add Variables
5 variables
Traditional Approach
31
parameters needed
25 - 1
Bayesian Network
10
parameters needed
1+1+4+2+2
Real Impact:

With 5 variables (Burglary-Alarm example), traditional needs 31 parameters

With 20 variables (medical diagnosis system), traditional needs over 1 million parameters!

Bayesian Networks with sparse structure: Just ~40-80 parameters! πŸš€

🎯 Summary: Why We Use Bayesian Networks
They Represent Relationships
Between events and variables
Help Understand Causes
Behind observations
Work with Incomplete Info
Perfect for uncertainty
Support Prediction & Diagnosis
Forward and backward reasoning
Make Decisions Systematic
Under uncertainty
Strong Mathematical Foundation
Based on probability theory

2. What is a Bayesian Network?

Intuitive Definition:

A Bayesian Network is a graph that shows how variables depend on each other, along with small tables that describe those dependencies numerically.

Formal Definition:
A Bayesian Network is a pair (G, Θ) where:
G = Directed Acyclic Graph (DAG)
Θ = Set of Conditional Probability Tables (CPTs)

2.1. The Graph Structure (G)

Components:
  • Nodes (circles): Represent random variables
  • Directed edges (arrows): Show direct influence or dependency
  • Acyclic: No cycles allowed (can't loop back to earlier nodes)
DAG Represents Dependencies Between Events

The directed acyclic graph structure captures causal and probabilistic relationships between events:

πŸ“ˆ Direct Dependencies:
"Rain" β†’ "Sprinkler" (Rain affects sprinkler decisions)
πŸ”„ Joint Dependencies:
Multiple causes can jointly affect an outcome (Rain + Sprinkler β†’ Grass Wet)
The graph structure tells us which events directly influence others, enabling efficient probability calculations.
Valid BN (Acyclic)
flowchart TD Rain[Rain] --> Sprinkler[Sprinkler] Rain --> Grass["Grass Wet"] Sprinkler --> Grass classDef valid fill:#32d583,color:white,stroke:#32d583,stroke-width:3px class Rain,Sprinkler,Grass valid

No cycles - valid DAG!

Invalid BN (Has Cycle)
flowchart TD A[A] --> B[B] B --> C[C] C --> A classDef invalid fill:#dc3545,color:white,stroke:#dc3545,stroke-width:3px class A,B,C invalid

Has cycle A→B→C→A - invalid!

2.2. Conditional Probability Tables (Θ)

What are CPTs?

Each variable has a Conditional Probability Table that specifies P(Variable | Parents) - the probability of each value given the parent values.

The CPT Equation:
ΞΈi = P(Xi | Parents(Xi))
Where:
β€’ ΞΈi = The conditional probability table for variable Xi
β€’ Xi = The i-th variable in our Bayesian network
β€’ Parents(Xi) = All variables that directly influence Xi
What This Means:

Each CPT answers: "What are the probabilities of this variable's values, given specific values of its parent variables?"

Has Disease
No parents (root node)
θ₁ = P(Has Disease)
Question: What's the probability someone has COVID?
Answer: 10% (disease prevalence)
Fever
Parent: Has Disease
ΞΈβ‚‚ = P(Fever | Disease)
Question: If someone has COVID, what's P(fever)?
Answer: 80% (COVID symptom)
Cough
Parent: Has Disease
θ₃ = P(Cough | Disease)
Question: If someone has COVID, what's P(cough)?
Answer: 90% (COVID symptom)
Test Result
Parents: Fever, Cough
ΞΈβ‚„ = P(Test Result | Fever, Cough)
Question: If fever AND cough present, what's P(positive test)?
Answer: 95% (high symptom correlation)
Intuition: Root nodes (no parents) are like prior beliefs or base rates. Nodes with parents are like conditional probabilities that combine multiple influences.
Key Point: The CPT size depends on parents: If a variable has k parents, each with d values, the CPT needs dk entries (one for each parent combination).
Interactive CPT Explorer: Medical Diagnosis Network

Click on each node to see its conditional probability table:

Disease Fever Cough Test
Click on a node above to see its Conditional Probability Table

Key Constraint:

Each row in a CPT must sum to 1.0 (probabilities must be normalized).

2.3. From Problem to Bayesian Network

The Process:

Start with a real-world problem β†’ identify variables β†’ draw relationships β†’ add probabilities

Watch: Medical Diagnosis Problem β†’ Bayesian Network
Step 0/6

Click "Next Step" to begin building a Bayesian Network from a medical diagnosis problem.

Building BNs: Key Steps
1 Understand the problem and domain
2 List all variables (outcomes and factors)
3 Identify causal links (what influences what)
4 Draw the graph (arrows show dependencies)
5 Specify CPTs from data or experts
6 Validate & use for inference

3. The Magic: How BNs Represent Joint Distributions

The Fundamental Equation:
P(X1, X2, ..., Xn) = ∏i=1n P(Xi | Parents(Xi))
Read as: The joint probability equals the product of all local conditional probabilities

Intuition:

Instead of storing one giant table with all combinations, we store many small tables (one per variable). The graph tells us how to multiply them together to get any probability we need.

Worked Example: Computing a Joint Probability

Question: What is P(Rain=T, Sprinkler=F, GrassWet=T)?

Step-by-Step Calculation:
P(R=T, S=F, G=T) = P(R=T) Γ— P(S=F | R=T) Γ— P(G=T | R=T, S=F)
Term From CPT Value
P(Rain = T) Prior probability 0.2
P(Sprinkler = F | Rain = T) Sprinkler's CPT (rain row) 0.9
P(GrassWet = T | Rain = T, Sprinkler = F) GrassWet's CPT (R=T, S=F row) 0.95
Final Result (multiply all): 0.2 Γ— 0.9 Γ— 0.95 = 0.171
Answer: P(Rain=T, Sprinkler=F, GrassWet=T) = 17.1%

Why This Works:

The graph structure encodes conditional independence assumptions. Each variable only depends on its direct parents, not on all other variables. This is what makes the factorization valid and dramatically reduces complexity.

4. Visual Comparison: Full Joint vs. Bayesian Network

Full Joint Table

Storage: 2n entries

For 3 variables: 8 entries

For 10 variables: 1,024 entries

For 20 variables: 1,048,576 entries!

Bayesian Network

Storage: Local CPTs only

For 3 variables: ~6 entries

For 10 variables: ~30 entries

For 20 variables: ~60 entries!

The Winner: Bayesian Networks!

By exploiting the structure of dependencies, Bayesian Networks can represent complex probability distributions with exponentially fewer parameters than the full joint table. This makes them practical for real-world applications with dozens or even hundreds of variables!

5. Key Takeaways

What is a Bayesian Network?
  • A DAG (Directed Acyclic Graph)
  • Plus CPTs (one per node)
  • Compactly represents joint distributions
  • Encodes conditional independence
Why Use Bayesian Networks?
  • Avoids exponential explosion
  • Mirrors causal structure
  • Enables efficient inference
  • Human-interpretable representation
Remember the Factorization:
P(X1, ..., Xn) = ∏i P(Xi | Parents(Xi))
Next Topic: We'll dive deeper into how this factorization works and explore the joint distribution factorization in detail.