Exact Inference: Enumeration

Computing P(Query | Evidence) by Systematic Enumeration

Introduction: The Inference Problem

The Central Question

You've built a Bayesian Network. Now what? How do you use it to answer questions?

P(Query | Evidence) = ?

Inference is the process of computing these conditional probabilities from the BN.

Query
Burglary

What we want to know
"Is there a burglary?"

Evidence
JohnCalls
MaryCalls

What we observed
"Both neighbors called"

Hidden
Earthquake
Alarm

What we don't know
"Sum over all possibilities"

Goal of This Topic

Learn Enumeration β€” the most straightforward exact inference algorithm. It's conceptually simple but computationally expensive. Understanding it is crucial for learning faster algorithms later!

Understanding Hidden Variables

πŸ“Œ What Are Hidden Variables?

Hidden variables are variables that are:

  • Not queried β€” Not the variable we want to know about (not Q)
  • Not observed β€” We don't have evidence for them (not E)
  • Must be summed out β€” We marginalize over all their possible values
Key Insight: Hidden variables are the ones we "don't care about" for our specific query, but we still need to account for them because they affect the probabilities!
Example: Identifying Hidden Variables
graph TD
    B[🚨 Burglary]
    E[🌍 Earthquake]
    A[πŸ”” Alarm]
    J[πŸ“ž John Calls]
    M[πŸ“ž Mary Calls]
    
    B --> A
    E --> A
    A --> J
    A --> M
    
    style B fill:#635bff,stroke:#0a2540,stroke-width:2px,color:#fff
    style E fill:#ffc107,stroke:#0a2540,stroke-width:2px,color:#000
    style A fill:#ffc107,stroke:#0a2540,stroke-width:2px,color:#000
    style J fill:#32D583,stroke:#0a2540,stroke-width:2px,color:#fff
    style M fill:#32D583,stroke:#0a2540,stroke-width:2px,color:#fff
                                
Query: P(Burglary | JohnCalls, MaryCalls)
Burglary
Query Variable (Q)
What we want to know
JohnCalls
MaryCalls
Evidence Variables (E)
What we observed
Earthquake
Alarm
Hidden Variables (H)
Must be summed out!
Why We Sum Over Hidden Variables

To compute P(Burglary | JohnCalls, MaryCalls), we need:

P(B | J, M) = Ξ£e,a P(B, J, M, e, a)/Ξ£b,e,a P(b, J, M, e, a)

We sum over all assignments of Earthquake and Alarm
(2 hidden vars Γ— 2 values each = 4 combinations to sum)

⚠️ The Limitation of Enumeration
Exponential Time Complexity

Enumeration requires exploring every possible assignment of hidden variables.

For n binary hidden variables:
2n assignments
  • 5 hidden vars β†’ 32 assignments
  • 10 hidden vars β†’ 1,024 assignments
  • 20 hidden vars β†’ 1,048,576 assignments
  • Quickly becomes intractable!
Practical Limitation
When is Enumeration Viable?
GOOD: Small networks (≀ 10 variables)
BAD: Large networks or many hidden variables
Solution: More efficient algorithms exist:
  • Variable Elimination
  • Belief Propagation
  • Approximate methods (sampling)
Interactive: Exponential Growth Visualization

Number of Hidden Variables
5 hidden variables
Assignments to Enumerate
32
possible combinations
25
Computation Time
0.001
milliseconds
Nearly instant
Understanding the Impact

Each additional hidden variable doubles the number of assignments to enumerate. This exponential growth is why enumeration is only practical for small networks. In real-world applications (medical diagnosis, robot navigation, etc.), we need more efficient algorithms!

1. The Enumeration Algorithm (3 Steps)

Core Insight

Inference by Enumeration is essentially doing Bayes' Rule + Chain Rule of Probability in the most brute-force, naΓ―ve way possible.

Enumeration Formula β†’ 3 Steps
P(Q | e) =
Ξ£h P(Q, e, h)
Ξ£q' Ξ£h P(q', e, h)
1
Identify Variables
Q = Query
e = Evidence
h = Hidden
2
Compute Numerator
Ξ£h P(Q, e, h)
Sum over all hidden values
2b
Compute Denominator
Ξ£q' Ξ£h P(q', e, h)
Sum over all Q + hidden
3
Normalize (Divide)
Divide Numerator (Step 2) by Denominator (Step 2b)
Key Insight: The Ξ£h symbol means we sum over all possible values of hidden variables. This "enumeration" gives the algorithm its name!
Detailed Step Breakdown
Step 1: Identify Q, e, h
Step 2: Compute top of fraction (numerator)
Step 3: Divide to get final probability
2
Compute Unnormalized Probabilities

For each value q of the query variable Q

P(Q = q, e) = Σh ∏i P(xi | parents(xi))
Summation:

Sum over all assignments to hidden variables H

Product:

Multiply CPTs using BN factorization for all variables

3
Normalize

Ensure probabilities sum to 1.0

P(Q = q | e) =
P(Q = q, e)
Ξ£q' P(Q = q', e)

Divide each unnormalized probability by the sum over all query values

πŸ•’ Time Complexity
Worst Case:
O(2n)

where n is the total number of variables in the network

Importance:

Despite exponential complexity, enumeration is crucial as:

  • The baseline for understanding BN inference
  • Works for small networks
  • Foundation for more efficient algorithms
Visual Process Flow
Identify
Q, E, H
Sum Over
Hidden vars
Multiply
CPT values
Normalize
Sum to 1.0
Result: P(Query | Evidence)

Simple Example: Medical Diagnosis

Scenario

A patient shows symptoms. We want to determine the probability of having the flu.

Bayesian Network
graph TD
    F[Flu]
    Fe[Fever]
    C[Cough]
    
    F --> Fe
    F --> C
    
    style F fill:#dc3545,stroke:#0a2540,stroke-width:3px,color:#fff
    style Fe fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
    style C fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
                                
Probability Tables
FluP
Yes0.1
No0.9
P(Fever|Flu)
FluP(F=Y)
Yes0.8
No0.2
P(Cough|Flu)
FluP(C=Y)
Yes0.7
No0.3
Given: Patient has Fever = Yes and Cough = Yes
Query: P(Flu = Yes | Fever = Yes, Cough = Yes) = ?
Step 1: Compute Numerator
Using: Chain Rule
P(Flu=Yes, Fever=Yes, Cough=Yes) = P(Flu=Yes) Γ— P(Fever=Yes|Flu=Yes) Γ— P(Cough=Yes|Flu=Yes)
Theorem: Using the Chain Rule (Product Rule) to factorize the joint probability based on the network structure.
= 0.1 Γ— 0.8 Γ— 0.7
= 0.056
Step 2: Compute Denominator
Using: Marginalization
P(Fever=Yes, Cough=Yes) = Ξ£Flu P(Flu, Fever=Yes, Cough=Yes)
Theorem: Using Marginalization (Sum Rule) β€” summing over all possible values of the query variable (Flu).
When Flu = Yes:
0.1 Γ— 0.8 Γ— 0.7 = 0.056
When Flu = No:
0.9 Γ— 0.2 Γ— 0.3 = 0.054
Total: 0.056 + 0.054 = 0.110
Step 3: Normalize
Using: Bayes' Rule
Theorem: Applying Bayes' Rule β€” dividing by total probability to get the conditional probability.
P(Flu=Yes | Fever=Yes, Cough=Yes) =
0.056
0.110
0.509
β‰ˆ 51%
Interpretation

If a patient has both fever and cough, there's approximately a 51% probability they have the flu.

Note: Although flu is relatively rare (10% prior probability), observing both symptoms increases the probability to about 50%, since these symptoms are much more likely when someone has the flu (80% and 70%) compared to when they don't (20% and 30%).

Complete Calculation Breakdown
Flu P(Flu) P(Fever=Yes|Flu) P(Cough=Yes|Flu) Joint Probability
Yes 0.1 0.8 0.7 0.056
Numerator P(Flu=Yes, Evidence): 0.056
No 0.9 0.2 0.3 0.054
Sum for Flu=No: 0.054
Total Denominator P(Fever=Yes, Cough=Yes): 0.110
Final Answer: P(Flu=Yes | Fever=Yes, Cough=Yes): 0.509

Medical Diagnosis: Numerical Example

Bayesian Network
graph TD
    D[Disease]
    I[Infection]
    F[Fever]
    T[Test]
    
    D --> I
    I --> F
    I --> T
    
    style D fill:#635bff,stroke:#0a2540,stroke-width:3px,color:#fff
    style I fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000
    style F fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
    style T fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
                                
Variables:
  • D = Disease {T, F}
  • I = Infection {T, F} Hidden
  • F = Fever {T, F}
  • T = Test {+, βˆ’}
Conditional Probability Tables (CPTs)
P(D)
T0.1
F0.9
DP(I=T|D)
T0.8
F0.2
IP(F=T|I)
T0.9
F0.3
IP(T=+|I)
T0.95
F0.1
Query:
P(Disease = Yes | Fever = Yes, Test = Positive)
Query = Disease
Evidence = {Fever, Test}
Hidden = Infection
1
Identify Variables
Query: Disease
Evidence: Fever=Yes, Test=Positive
Hidden: Infection
2
Sum Over Hidden Variable: Infection
P(Disease, Fever, Test) = ΣInfection ∈ {Yes,No} P(Disease, Infection, Fever, Test)
For Disease = Yes:
When Infection = Yes:
P(Disease=Yes) Β· P(Infection=Yes|Disease=Yes) Β· P(Fever=Yes|Infection=Yes) Β· P(Test=+|Infection=Yes)
= 0.1 Γ— 0.8 Γ— 0.9 Γ— 0.95 = 0.0684
When Infection = No:
P(Disease=Yes) Β· P(Infection=No|Disease=Yes) Β· P(Fever=Yes|Infection=No) Β· P(Test=+|Infection=No)
= 0.1 Γ— 0.2 Γ— 0.3 Γ— 0.1 = 0.0006
Sum: 0.0684 + 0.0006 = 0.069
For Disease = No:
When Infection = Yes:
P(Disease=No) Β· P(Infection=Yes|Disease=No) Β· P(Fever=Yes|Infection=Yes) Β· P(Test=+|Infection=Yes)
= 0.9 Γ— 0.2 Γ— 0.9 Γ— 0.95 = 0.1539
When Infection = No:
P(Disease=No) Β· P(Infection=No|Disease=No) Β· P(Fever=Yes|Infection=No) Β· P(Test=+|Infection=No)
= 0.9 Γ— 0.8 Γ— 0.3 Γ— 0.1 = 0.0216
Sum: 0.1539 + 0.0216 = 0.1755
3
Normalize
Normalizing Constant:
Z = P(Disease=Yes, Evidence) + P(Disease=No, Evidence)
= 0.069 + 0.1755
Z = 0.2445
Final Probability:
P(Disease=Yes | Evidence) =
0.069
0.2445
0.282
(28.2%)
Complete Computation Table
Disease Infection P(Disease) P(Infection|Disease) P(Fever=Yes|Infection) P(Test=+|Infection) Joint Probability
Yes Yes 0.1 0.8 0.9 0.95 0.0684
No 0.1 0.2 0.3 0.1 0.0006
Sum for Disease=Yes: 0.069
No Yes 0.9 0.2 0.9 0.95 0.1539
No 0.9 0.8 0.3 0.1 0.0216
Sum for Disease=No: 0.1755
Total (Z): 0.2445
P(Disease=Yes | Fever=Yes, Test=+): 0.282
Interpretation: Despite positive symptoms and test, disease probability is only 28.2%. The hidden variable Infection (which we summed over) explains the symptoms better than Disease alone.

2. Simple Example: P(Burglary | Alarm)

Simplified Scenario

Let's start with a minimal network: Just 3 variables to make the process crystal clear.

graph TD B[Burglary
P = 0.001] E[Earthquake
P = 0.002] A[Alarm] B --> A E --> A style B fill:#635bff,stroke:#0a2540,stroke-width:3px,color:#fff style E fill:#00d4ff,stroke:#0a2540,stroke-width:3px,color:#fff style A fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000
Variables:
  • Burglary β€” What we want to know
  • Alarm β€” What we observed (Alarm = true)
  • Earthquake β€” Hidden variable
CPTs:
P(B): 0.001
P(E): 0.002
P(A|B,E):
β€’ B=T, E=T β†’ 0.95
β€’ B=T, E=F β†’ 0.94
β€’ B=F, E=T β†’ 0.29
β€’ B=F, E=F β†’ 0.001
Computing P(Burglary = true | Alarm = true)
Step 0: Write the Formula
P(B | A) =
P(B, A)
P(A)
=
Ξ£e P(B, A, e)
Ξ£b Ξ£e P(b, A, e)

We need to sum over Earthquake (hidden variable)

Step 1: Compute Numerator β€” P(B=T, A=T)

Sum over both values of Earthquake:

Case 1: E = true
P(B=T, A=T, E=T)
= P(B=T) Γ— P(E=T) Γ— P(A=T | B=T, E=T)
= 0.001 Γ— 0.002 Γ— 0.95
= 0.0000019
Case 2: E = false
P(B=T, A=T, E=F)
= P(B=T) Γ— P(E=F) Γ— P(A=T | B=T, E=F)
= 0.001 Γ— 0.998 Γ— 0.94
= 0.00093812
Numerator: 0.0000019 + 0.00093812 = 0.00094002
Step 2: Compute Denominator β€” P(A=T)

Sum over all combinations of B and E:

B E P(B) P(E) P(A=T|B,E) Product
T T 0.001 0.002 0.95 0.0000019
T F 0.001 0.998 0.94 0.00093812
F T 0.999 0.002 0.29 0.00057942
F F 0.999 0.998 0.001 0.00099700
Sum (Denominator): 0.00251644
Step 3: Normalize
P(B=T | A=T) =
0.00094002 / 0.00251644 =
0.374

37.4% chance of burglary

P(B=F | A=T) =
(0.00057942 + 0.00099700) / 0.00251644 =
0.626

62.6% chance NO burglary

Verification: 0.374 + 0.626 = 1.000 βœ“ (Probabilities sum to 1!)

Interpretation

Even though the alarm went off, there's only a 37.4% chance of burglary. Why? Because burglaries are rare (0.1%), and the alarm can be triggered by other causes (earthquake, or just randomly). This shows the importance of considering base rates and alternative explanations!

3. Complete Example: P(Burglary | JohnCalls, MaryCalls)

The Full Network

Now let's tackle the complete 5-variable Burglary-Alarm network β€” the classic example from Russell & Norvig's AI textbook!

graph TD B[🚨 Burglary
P = 0.001] E[🌍 Earthquake
P = 0.002] A[πŸ”” Alarm] J[πŸ“ž JohnCalls] M[πŸ“ž MaryCalls] B --> A E --> A A --> J A --> M style B fill:#635bff,stroke:#0a2540,stroke-width:3px,color:#fff style E fill:#00d4ff,stroke:#0a2540,stroke-width:3px,color:#fff style A fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000 style J fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff style M fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
Question:
Given that both John and Mary called, what's the probability of a burglary?
Query:
Burglary
Evidence:
J = true M = true
Hidden:
Earthquake Alarm
Strategy

We need to sum over all combinations of Earthquake and Alarm (the hidden variables). That's 2 Γ— 2 = 4 combinations to enumerate.

P(B | J=T, M=T) =
Ξ£e Ξ£a P(B, J=T, M=T, e, a)
Ξ£b Ξ£e Ξ£a P(b, J=T, M=T, e, a)
e = Earthquake (2 values)
a = Alarm (2 values)
Step-by-Step Interactive Enumeration

Click "Next Step" to see the enumeration process:

4. Complexity: The Exponential Problem

The Bad News

Enumeration has exponential time complexity: O(2ⁿ) where n is the number of variables. This makes it impractical for large networks.

Enumeration Cost
Variables Entries to Sum Feasible?
5 32 βœ“ Yes
10 1,024 βœ“ Yes
20 1,048,576 ⚠️ Slow
30 1,073,741,824 βœ— No
50 1.13 Γ— 10¹⁡ βœ— No
Why So Slow?
  • Full enumeration: Must sum over exponentially many combinations of hidden variables (in worst case, 2h where h is number of hidden vars)
  • Each hidden variable doubles the work: Every additional hidden variable multiplies computation by 2
  • No structure exploitation: Doesn't reuse intermediate calculations
  • Redundant computation: Recalculates same subexpressions multiple times
The Good News

Better algorithms exist!
β€’ Variable Elimination: Exploits BN structure
β€’ Belief Propagation: Efficient for tree-like networks
β€’ Approximate methods: Trade accuracy for speed

These will be covered in upcoming topics!

When to Use Enumeration
βœ“ Good For:
  • Small networks (< 15 variables)
  • Teaching and understanding
  • Verifying other algorithms
  • Exact ground truth
βœ— Bad For:
  • Large networks (> 20 variables)
  • Real-time applications
  • Production systems
  • Resource-constrained devices

Summary & Key Takeaways

What We Learned
  1. Inference problem: Compute P(Query | Evidence)
  2. 3-step algorithm: Select β†’ Sum β†’ Normalize
  3. Uses BN factorization: P(all) = ∏ P(Xᡒ | Parents)
  4. Marginalizes hidden variables: Sum over all possibilities
  5. Exponential complexity: O(2ⁿ) β€” too slow for large networks
  6. But conceptually simple: Foundation for understanding faster algorithms
Key Insights
  • Exact inference is possible but can be expensive
  • Base rates matter: Even with evidence, rare events stay rare
  • Alternative explanations: Evidence can have multiple causes
  • Marginalization is key: Sum out what you don't need
  • BN structure helps: Factorization simplifies calculations
  • Better methods exist: Variable elimination, belief propagation, etc.
The Foundation of BN Inference

Enumeration is the simplest exact inference algorithm.
While too slow for practical use on large networks, it's essential for understanding how BN inference works.
Every advanced algorithm (Variable Elimination, Junction Tree, Belief Propagation) is essentially a smarter way to do enumeration β€” avoiding redundant calculations and exploiting network structure.

Master enumeration, and you'll understand the foundation of all BN inference!