Exact Inference: Enumeration

Introduction: The Inference Problem

The Central Question

You've built a Bayesian Network. Now what? How do you use it to answer questions?

P(Query | Evidence) = ?

Inference is the process of computing these conditional probabilities from the BN.

Query

Burglary

What we want to know
"Is there a burglary?"

Evidence

JohnCalls

MaryCalls

What we observed
"Both neighbors called"

Hidden

Earthquake

Alarm

What we don't know
"Sum over all possibilities"

Goal of This Topic

Learn Enumeration — the most straightforward exact inference algorithm. It's conceptually simple but computationally expensive. Understanding it is crucial for learning faster algorithms later!

Understanding Hidden Variables

📌 What Are Hidden Variables?

Hidden variables are variables that are:

Not queried — Not the variable we want to know about (not Q)
Not observed — We don't have evidence for them (not E)
Must be summed out — We marginalize over all their possible values

Key Insight: Hidden variables are the ones we "don't care about" for our specific query, but we still need to account for them because they affect the probabilities!

Example: Identifying Hidden Variables

graph TD
    B[🚨 Burglary]
    E[🌍 Earthquake]
    A[🔔 Alarm]
    J[📞 John Calls]
    M[📞 Mary Calls]
    
    B --> A
    E --> A
    A --> J
    A --> M
    
    style B fill:#635bff,stroke:#0a2540,stroke-width:2px,color:#fff
    style E fill:#ffc107,stroke:#0a2540,stroke-width:2px,color:#000
    style A fill:#ffc107,stroke:#0a2540,stroke-width:2px,color:#000
    style J fill:#32D583,stroke:#0a2540,stroke-width:2px,color:#fff
    style M fill:#32D583,stroke:#0a2540,stroke-width:2px,color:#fff

Query: P(Burglary | JohnCalls, MaryCalls)

Burglary

Query Variable (Q)
What we want to know

JohnCalls

MaryCalls

Evidence Variables (E)
What we observed

Earthquake

Alarm

Hidden Variables (H)
Must be summed out!

Why We Sum Over Hidden Variables

To compute P(Burglary | JohnCalls, MaryCalls), we need:

P(B | J, M) = ^{Σ_e,a P(B, J, M, e, a)}/_{Σ_b,e,a P(b, J, M, e, a)}

We sum over all assignments of Earthquake and Alarm
(2 hidden vars × 2 values each = 4 combinations to sum)

⚠️ The Limitation of Enumeration

Exponential Time Complexity

Enumeration requires exploring every possible assignment of hidden variables.

For n binary hidden variables:

2ⁿ assignments

5 hidden vars → 32 assignments
10 hidden vars → 1,024 assignments
20 hidden vars → 1,048,576 assignments
Quickly becomes intractable!

Practical Limitation

When is Enumeration Viable?

GOOD: Small networks (≤ 10 variables)

BAD: Large networks or many hidden variables

Solution: More efficient algorithms exist:

Variable Elimination
Belief Propagation
Approximate methods (sampling)

Interactive: Exponential Growth Visualization

Number of Hidden Variables

5 hidden variables

Assignments to Enumerate

32

possible combinations

2⁵

Computation Time

0.001

milliseconds

Nearly instant

Understanding the Impact

Each additional hidden variable doubles the number of assignments to enumerate. This exponential growth is why enumeration is only practical for small networks. In real-world applications (medical diagnosis, robot navigation, etc.), we need more efficient algorithms!

1. The Enumeration Algorithm (3 Steps)

Core Insight

Inference by Enumeration is essentially doing Bayes' Rule + Chain Rule of Probability in the most brute-force, naïve way possible.

Enumeration Formula → 3 Steps

P(Q | e) =

Σ_h P(Q, e, h)

Σ_q' Σ_h P(q', e, h)

1

Identify Variables

Q = Query

e = Evidence

h = Hidden

2

Compute Numerator

Σ_h P(Q, e, h)

Sum over all hidden values

2b

Compute Denominator

Σ_q' Σ_h P(q', e, h)

Sum over all Q + hidden

3

Normalize (Divide)

Divide Numerator (Step 2) by Denominator (Step 2b)

Key Insight: The Σ_h symbol means we sum over all possible values of hidden variables. This "enumeration" gives the algorithm its name!

Detailed Step Breakdown

Step 1: Identify Q, e, h

Step 2: Compute top of fraction (numerator)

Step 3: Divide to get final probability

2

Compute Unnormalized Probabilities

For each value q of the query variable Q

P(Q = q, e) = Σ_h ∏_i P(x_i | parents(x_i))

Summation:

Sum over all assignments to hidden variables H

Product:

Multiply CPTs using BN factorization for all variables

3

Normalize

Ensure probabilities sum to 1.0

P(Q = q | e) =

P(Q = q, e)

Σ_q' P(Q = q', e)

Divide each unnormalized probability by the sum over all query values

🕒 Time Complexity

Worst Case:

O(2ⁿ)

where n is the total number of variables in the network

Importance:

Despite exponential complexity, enumeration is crucial as:

The baseline for understanding BN inference
Works for small networks
Foundation for more efficient algorithms

Visual Process Flow

Identify

Q, E, H

Sum Over

Hidden vars

Multiply

CPT values

Normalize

Sum to 1.0

Result: P(Query | Evidence)

Simple Example: Medical Diagnosis

Scenario

A patient shows symptoms. We want to determine the probability of having the flu.

Bayesian Network

graph TD
    F[Flu]
    Fe[Fever]
    C[Cough]
    
    F --> Fe
    F --> C
    
    style F fill:#dc3545,stroke:#0a2540,stroke-width:3px,color:#fff
    style Fe fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
    style C fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff

Probability Tables

Flu	P
Yes	0.1
No	0.9

P(Fever\|Flu)
Flu	P(F=Y)
Yes	0.8
No	0.2

P(Cough\|Flu)
Flu	P(C=Y)
Yes	0.7
No	0.3

Given: Patient has Fever = Yes and Cough = Yes

Query: P(Flu = Yes | Fever = Yes, Cough = Yes) = ?

Step 1: Compute Numerator

Using: Chain Rule

P(Flu=Yes, Fever=Yes, Cough=Yes) = P(Flu=Yes) × P(Fever=Yes|Flu=Yes) × P(Cough=Yes|Flu=Yes)

Theorem: Using the Chain Rule (Product Rule) to factorize the joint probability based on the network structure.

                            = 0.1 × 0.8 × 0.7
                        
                            = 0.056

Step 2: Compute Denominator

Using: Marginalization

P(Fever=Yes, Cough=Yes) = Σ_Flu P(Flu, Fever=Yes, Cough=Yes)

Theorem: Using Marginalization (Sum Rule) — summing over all possible values of the query variable (Flu).

When Flu = Yes:

0.1 × 0.8 × 0.7 = 0.056

When Flu = No:

0.9 × 0.2 × 0.3 = 0.054

Total: 0.056 + 0.054 = 0.110

Step 3: Normalize

Using: Bayes' Rule

Theorem: Applying Bayes' Rule — dividing by total probability to get the conditional probability.

P(Flu=Yes | Fever=Yes, Cough=Yes) =

0.056

0.110

0.509

≈ 51%

Interpretation

If a patient has both fever and cough, there's approximately a 51% probability they have the flu.

Note: Although flu is relatively rare (10% prior probability), observing both symptoms increases the probability to about 50%, since these symptoms are much more likely when someone has the flu (80% and 70%) compared to when they don't (20% and 30%).

Complete Calculation Breakdown

Flu	P(Flu)	P(Fever=Yes\|Flu)	P(Cough=Yes\|Flu)	Joint Probability
Yes	0.1	0.8	0.7	0.056
Numerator P(Flu=Yes, Evidence):				0.056
No	0.9	0.2	0.3	0.054
Sum for Flu=No:				0.054
Total Denominator P(Fever=Yes, Cough=Yes):				0.110
Final Answer: P(Flu=Yes \| Fever=Yes, Cough=Yes):				0.509

Medical Diagnosis: Numerical Example

Bayesian Network

graph TD
    D[Disease]
    I[Infection]
    F[Fever]
    T[Test]
    
    D --> I
    I --> F
    I --> T
    
    style D fill:#635bff,stroke:#0a2540,stroke-width:3px,color:#fff
    style I fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000
    style F fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
    style T fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff

Variables:

D = Disease {T, F}
I = Infection {T, F} Hidden
F = Fever {T, F}
T = Test {+, −}

Conditional Probability Tables (CPTs)

P(D)
T	0.1
F	0.9

D	P(I=T\|D)
T	0.8
F	0.2

I	P(F=T\|I)
T	0.9
F	0.3

I	P(T=+\|I)
T	0.95
F	0.1

Query:

P(Disease = Yes | Fever = Yes, Test = Positive)

Query = Disease

Evidence = {Fever, Test}

Hidden = Infection

1

Identify Variables

Query: Disease

Evidence: Fever=Yes, Test=Positive

Hidden: Infection

2

Sum Over Hidden Variable: Infection

P(Disease, Fever, Test) = Σ_{Infection ∈ {Yes,No}} P(Disease, Infection, Fever, Test)

For Disease = Yes:

When Infection = Yes:

P(Disease=Yes) · P(Infection=Yes|Disease=Yes) · P(Fever=Yes|Infection=Yes) · P(Test=+|Infection=Yes)

= 0.1 × 0.8 × 0.9 × 0.95 = 0.0684

When Infection = No:

P(Disease=Yes) · P(Infection=No|Disease=Yes) · P(Fever=Yes|Infection=No) · P(Test=+|Infection=No)

= 0.1 × 0.2 × 0.3 × 0.1 = 0.0006

Sum: 0.0684 + 0.0006 = 0.069

For Disease = No:

When Infection = Yes:

P(Disease=No) · P(Infection=Yes|Disease=No) · P(Fever=Yes|Infection=Yes) · P(Test=+|Infection=Yes)

= 0.9 × 0.2 × 0.9 × 0.95 = 0.1539

When Infection = No:

P(Disease=No) · P(Infection=No|Disease=No) · P(Fever=Yes|Infection=No) · P(Test=+|Infection=No)

= 0.9 × 0.8 × 0.3 × 0.1 = 0.0216

Sum: 0.1539 + 0.0216 = 0.1755

3

Normalize

Normalizing Constant:

Z = P(Disease=Yes, Evidence) + P(Disease=No, Evidence)

= 0.069 + 0.1755

Z = 0.2445

Final Probability:

P(Disease=Yes | Evidence) =

0.069

0.2445

0.282

(28.2%)

Complete Computation Table

Disease	Infection	P(Disease)	P(Infection\|Disease)	P(Fever=Yes\|Infection)	P(Test=+\|Infection)	Joint Probability
Yes	Yes	0.1	0.8	0.9	0.95	0.0684
Yes	No	0.1	0.2	0.3	0.1	0.0006
Sum for Disease=Yes:						0.069
No	Yes	0.9	0.2	0.9	0.95	0.1539
No	No	0.9	0.8	0.3	0.1	0.0216
Sum for Disease=No:						0.1755
Total (Z):						0.2445
P(Disease=Yes \| Fever=Yes, Test=+):						0.282

Interpretation: Despite positive symptoms and test, disease probability is only 28.2%. The hidden variable Infection (which we summed over) explains the symptoms better than Disease alone.

2. Simple Example: P(Burglary | Alarm)

Simplified Scenario

Let's start with a minimal network: Just 3 variables to make the process crystal clear.

graph TD B[Burglary
P = 0.001] E[Earthquake
P = 0.002] A[Alarm] B --> A E --> A style B fill:#635bff,stroke:#0a2540,stroke-width:3px,color:#fff style E fill:#00d4ff,stroke:#0a2540,stroke-width:3px,color:#fff style A fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000

Variables:

Burglary — What we want to know
Alarm — What we observed (Alarm = true)
Earthquake — Hidden variable

CPTs:

P(B): 0.001
P(E): 0.002
P(A|B,E):
• B=T, E=T → 0.95
• B=T, E=F → 0.94
• B=F, E=T → 0.29
• B=F, E=F → 0.001

Computing P(Burglary = true | Alarm = true)

Step 0: Write the Formula

P(B | A) =

P(B, A)

P(A)

=

Σ_e P(B, A, e)

Σ_b Σ_e P(b, A, e)

We need to sum over Earthquake (hidden variable)

Step 1: Compute Numerator — P(B=T, A=T)

Sum over both values of Earthquake:

Case 1: E = true
P(B=T, A=T, E=T)

= P(B=T) × P(E=T) × P(A=T | B=T, E=T)
= 0.001 × 0.002 × 0.95
= 0.0000019

Case 2: E = false
P(B=T, A=T, E=F)

= P(B=T) × P(E=F) × P(A=T | B=T, E=F)
= 0.001 × 0.998 × 0.94
= 0.00093812

Numerator: 0.0000019 + 0.00093812 = 0.00094002

Step 2: Compute Denominator — P(A=T)

Sum over all combinations of B and E:

B	E	P(B)	P(E)	P(A=T\|B,E)	Product
T	T	0.001	0.002	0.95	0.0000019
T	F	0.001	0.998	0.94	0.00093812
F	T	0.999	0.002	0.29	0.00057942
F	F	0.999	0.998	0.001	0.00099700
Sum (Denominator):					0.00251644

Step 3: Normalize

P(B=T | A=T) =
0.00094002 / 0.00251644 =
0.374

37.4% chance of burglary

P(B=F | A=T) =
(0.00057942 + 0.00099700) / 0.00251644 =
0.626

62.6% chance NO burglary

Verification: 0.374 + 0.626 = 1.000 ✓ (Probabilities sum to 1!)

Interpretation

Even though the alarm went off, there's only a 37.4% chance of burglary. Why? Because burglaries are rare (0.1%), and the alarm can be triggered by other causes (earthquake, or just randomly). This shows the importance of considering base rates and alternative explanations!

3. Complete Example: P(Burglary | JohnCalls, MaryCalls)

The Full Network

Now let's tackle the complete 5-variable Burglary-Alarm network — the classic example from Russell & Norvig's AI textbook!

graph TD B[🚨 Burglary
P = 0.001] E[🌍 Earthquake
P = 0.002] A[🔔 Alarm] J[📞 JohnCalls] M[📞 MaryCalls] B --> A E --> A A --> J A --> M style B fill:#635bff,stroke:#0a2540,stroke-width:3px,color:#fff style E fill:#00d4ff,stroke:#0a2540,stroke-width:3px,color:#fff style A fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000 style J fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff style M fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff

Question:

Given that both John and Mary called, what's the probability of a burglary?

Query:
Burglary

Evidence:
J = true M = true

Hidden:
Earthquake Alarm

Strategy

We need to sum over all combinations of Earthquake and Alarm (the hidden variables). That's 2 × 2 = 4 combinations to enumerate.

P(B | J=T, M=T) =

Σ_e Σ_a P(B, J=T, M=T, e, a)

Σ_b Σ_e Σ_a P(b, J=T, M=T, e, a)

e = Earthquake (2 values)

a = Alarm (2 values)

Step-by-Step Interactive Enumeration

Click "Next Step" to see the enumeration process:

4. Complexity: The Exponential Problem

The Bad News

Enumeration has exponential time complexity: O(2ⁿ) where n is the number of variables. This makes it impractical for large networks.

Enumeration Cost

Variables	Entries to Sum	Feasible?
5	32	✓ Yes
10	1,024	✓ Yes
20	1,048,576	⚠️ Slow
30	1,073,741,824	✗ No
50	1.13 × 10¹⁵	✗ No

Why So Slow?

Full enumeration: Must sum over exponentially many combinations of hidden variables (in worst case, 2^h where h is number of hidden vars)
Each hidden variable doubles the work: Every additional hidden variable multiplies computation by 2
No structure exploitation: Doesn't reuse intermediate calculations
Redundant computation: Recalculates same subexpressions multiple times

The Good News

Better algorithms exist!
• Variable Elimination: Exploits BN structure
• Belief Propagation: Efficient for tree-like networks
• Approximate methods: Trade accuracy for speed

These will be covered in upcoming topics!

When to Use Enumeration
                                ✓ Good For:
                                Small networks (< 15 variables)
Teaching and understanding
Verifying other algorithms
Exact ground truth

                            

                                ✗ Bad For:
                                Large networks (> 20 variables)
Real-time applications
Production systems
Resource-constrained devices

                            

Summary & Key Takeaways

What We Learned

Inference problem: Compute P(Query | Evidence)
3-step algorithm: Select → Sum → Normalize
Uses BN factorization: P(all) = ∏ P(Xᵢ | Parents)
Marginalizes hidden variables: Sum over all possibilities
Exponential complexity: O(2ⁿ) — too slow for large networks
But conceptually simple: Foundation for understanding faster algorithms

                            Key Insights
                            Exact inference is possible but can be expensive
Base rates matter: Even with evidence, rare events stay rare
Alternative explanations: Evidence can have multiple causes
Marginalization is key: Sum out what you don't need
BN structure helps: Factorization simplifies calculations
Better methods exist: Variable elimination, belief propagation, etc.

                        

The Foundation of BN Inference

Enumeration is the simplest exact inference algorithm.
While too slow for practical use on large networks, it's essential for understanding how BN inference works.
Every advanced algorithm (Variable Elimination, Junction Tree, Belief Propagation) is essentially a smarter way to do enumeration — avoiding redundant calculations and exploiting network structure.

Master enumeration, and you'll understand the foundation of all BN inference!

Previous: Constructing BNs Back to Lecture 11 Next: Variable Elimination

Introduction: The Inference Problem

The Central Question

Goal of This Topic

Understanding Hidden Variables

📌 What Are Hidden Variables?

Example: Identifying Hidden Variables

Why We Sum Over Hidden Variables

⚠️ The Limitation of Enumeration

Exponential Time Complexity

Practical Limitation

Interactive: Exponential Growth Visualization

Assignments to Enumerate

Computation Time

Understanding the Impact

1. The Enumeration Algorithm (3 Steps)

Core Insight

Enumeration Formula → 3 Steps

Detailed Step Breakdown

Compute Unnormalized Probabilities

Normalize

🕒 Time Complexity

Visual Process Flow

Simple Example: Medical Diagnosis

Scenario

Bayesian Network

Probability Tables

Given: Patient has Fever = Yes and Cough = Yes

Query: P(Flu = Yes | Fever = Yes, Cough = Yes) = ?

Step 1: Compute Numerator

Step 2: Compute Denominator

Step 3: Normalize

Interpretation

Complete Calculation Breakdown

Medical Diagnosis: Numerical Example

Bayesian Network

Conditional Probability Tables (CPTs)

Query:

Identify Variables

Sum Over Hidden Variable: Infection

Normalize

Complete Computation Table

2. Simple Example: P(Burglary | Alarm)

Simplified Scenario

Variables:

CPTs:

Computing P(Burglary = true | Alarm = true)

Step 0: Write the Formula

Step 1: Compute Numerator — P(B=T, A=T)

Step 2: Compute Denominator — P(A=T)

Step 3: Normalize

Interpretation

3. Complete Example: P(Burglary | JohnCalls, MaryCalls)

The Full Network

Question:

Strategy

Step-by-Step Interactive Enumeration

Final Result

28.4%

71.6%

4. Complexity: The Exponential Problem

The Bad News

Enumeration Cost

Why So Slow?

The Good News

When to Use Enumeration

Summary & Key Takeaways

What We Learned

Key Insights

The Foundation of BN Inference