Variable Elimination Algorithm

The Smart Way to Do Exact Inference

The Big Idea

Variable Elimination improves on enumeration by avoiding redundant computation. Instead of summing the same sub-expressions multiple times, we compute once and reuse!

🌟 The Best Intuition: VE = Cleaning Up Before You Calculate

Enumeration

Doing a huge messy calculation as-is

Variable Elimination

Simplifying the expression before calculating

Just like in high school algebra!
Enumeration Approach
Expand Everything First
(a + b)(c + d)(e + f)
= ace + acf + ade + adf
  + bce + bcf + bde + bdf
8 terms to compute!
Lots of repeated work
VE Approach
Simplify & Eliminate First
(a + b)(c + d)(e + f)
Step 1: Let x = (e + f)
Step 2: Let y = (c + d)x
Step 3: Result = (a + b)y
Compute once, reuse!
Much more efficient
Key Insight

VE is literally the "factor & simplify" trick from high school algebra, but applied to probabilities. Instead of expanding everything and doing redundant work, we eliminate unneeded variables and reuse intermediate results.

1. From Enumeration to Variable Elimination

The Redundancy Problem in Enumeration
Example Query:

Compute P(B | A) in a simple network: ABCD

P(B | A) = α · P(B, A)

where α is the normalization constant

Enumeration Approach (Wasteful!)

Enumerate over all hidden variables (C and D):

P(B,A) = P(A,B,C=T,D=T) + P(A,B,C=T,D=F)
         + P(A,B,C=F,D=T) + P(A,B,C=F,D=F)
Problem: The highlighted terms both compute P(A,B,C=T) and then multiply by different D values! This is redundant — we're computing the same sub-expression multiple times.
The Key Transformation
Step 1
Enumeration: Expand Everything
P(B,A) = ΣC,D P(A,B,C,D)

Sum over all combinations of C and D

Step 2
Factor by Chain Rule
= ΣC,D P(A)·P(B|A)·P(C|B)·P(D|C)

Break joint into conditional probabilities

Step 3
Push Summations Inside
= P(A)·P(B|A)·ΣC P(C|B)·[ΣD P(D|C)]

Group terms that don't depend on outer variables

Step 4
Result: Product of Factors
= f1 × f2 × f3
Where:
f1 = P(A) · P(B|A)
f2 = ΣD P(D|C) (after eliminating D)
f3 = ΣC P(C|B) · f2 (after eliminating C)

Each factor is computed once and reused

Side-by-Side Comparison
Enumeration
P(B,A) = P(A,B,C=T,D=T)
         + P(A,B,C=T,D=F)
         + P(A,B,C=F,D=T)
         + P(A,B,C=F,D=F)
Computation: 4 full joint probabilities
Redundancy: Computes P(A,B,C) twice!
Variable Elimination
P(B,A) = P(A,B,C=T) × [P(D=T) + P(D=F)]
         + P(A,B,C=F) × [P(D=T) + P(D=F)]
Computation: Eliminate D first (ΣD = 1)
Efficiency: Computes P(A,B,C) once!
The Big Insight

Enumeration computes the same intermediate results many times because it sums over all variables simultaneously.

Variable Elimination computes each intermediate result exactly once by:

  1. Rearranging summations to eliminate variables one at a time
  2. Storing intermediate factors and reusing them
  3. Dynamic programming on the computational graph

2. Key Concepts: Factors

What is a Factor?

A factor is just a multi-dimensional table of numbers. It can represent:

  • A CPT: P(B|A)
  • A joint probability: P(A,B,C)
  • Any function of variables: f(X,Y,Z)
Two Simple Operations:
1. Factor Product (×)

Multiply factors to combine variables

f₁(A,B) × f₂(B,C) = f₃(A,B,C)
Entry-wise multiplication for matching values of B
2. Factor Marginalization (Σ)

Sum out a variable to eliminate it

ΣB f(A,B,C) = f′(A,C)
Sum over all values of B

3. The Variable Elimination Algorithm

Goal

Compute P(Q | e) efficiently by eliminating variables one at a time

1
Initialize Factors
What to do:

Create one factor for each CPT in the network

Example: For network ABC
fA = P(A)
fB = P(B|A)
fC = P(C|B)
Why:

Each factor represents a piece of the joint probability. We'll combine them strategically to avoid redundant computation.

2
Eliminate Hidden Variables (One at a Time)
What to do:

For each hidden variable X:

  1. Multiply all factors that mention X
  2. Sum out X from the product
  3. Store the result as a new factor
Why:

By eliminating variables one at a time, we create smaller intermediate results that we can reuse, avoiding the exponential blowup of enumeration.

Mini Example: Eliminating C
Before:
f1(B,C)
f2(C,D)
After eliminating C:
fnew(B,D) = ΣC [f1 × f2]
3
Multiply & Normalize
What to do:
  1. Multiply all remaining factors
  2. Normalize to get probability distribution
P(Q|e) = α · [f1 × f2 × ...]
where α ensures sum = 1
Why:

After eliminating all hidden variables, we're left with factors only over the query and evidence. Normalizing gives us the final conditional probability.

The Key Pattern
InitializeEliminate X1Eliminate X2 → ... → Normalize

Each elimination step reduces the problem size!

4. Complete Example

Query: P(Flu | Fever=Yes)

A simple 3-node network: FluFever, FluCough

Network
graph TD
    F[Flu]
    Fe[Fever]
    C[Cough]
    
    F --> Fe
    F --> C
    
    style F fill:#dc3545,stroke:#0a2540,stroke-width:3px,color:#fff
    style Fe fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
    style C fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000
                                
Given CPTs
FluP
Yes0.1
No0.9
P(Fever|Flu)
F=Y0.8
F=N0.2
P(Cough|Flu)
F=Y then C=Y0.7
F=N then C=Y0.3
Step 1: Initialize Factors & Set Evidence

Create factors and instantiate evidence (Fever=Yes)

f₁ = P(Flu)
Values:
f₁(Flu=Yes) = 0.1
f₁(Flu=No) = 0.9
f₂ = P(Fever=Yes|Flu)
Evidence set!
Values:
f₂(Flu=Yes) = 0.8
f₂(Flu=No) = 0.2
f₃ = P(Cough|Flu)
Values:
f₃(Flu=Yes, Cough=Yes) = 0.7
f₃(Flu=No, Cough=Yes) = 0.3
Note: We set Fever=Yes in factor f₂, so it becomes a function only of Flu
Step 2: Eliminate Cough (Hidden Variable)

Only f₃ contains Cough, so we sum it out:

f₄(Flu) = ΣCough [f₃(Cough, Flu)]
Result: f₄(Flu) = 1 for all Flu values (probabilities sum to 1)
Step 3: Multiply Remaining Factors

Multiply all factors over Flu:

P(Flu, Fever=Yes) = f₁(Flu) × f₂(Flu) × f₄(Flu)

Flu=Yes:

P(Flu) × P(Fever=Yes|Flu=Yes) × 1
= 0.1 × 0.8 × 1 = 0.08

Flu=No:

P(Flu) × P(Fever=Yes|Flu=No) × 1
= 0.9 × 0.2 × 1 = 0.18
Step 4: Normalize
P(Flu=Yes | Fever=Yes) = 0.08 / (0.08 + 0.18) =
0.308 (30.8%)

5. Why Variable Elimination Wins

Enumeration
  • Complexity: O(2n) — exponential in ALL variables
  • Repeats same calculations many times
  • Simple but extremely inefficient
Variable Elimination
  • Complexity: O(2w) — exponential in tree-width only
  • Computes each sub-expression exactly once
  • Much more efficient in practice!
Key Insight

The order in which you eliminate variables matters! Smart ordering can make the difference between tractable and intractable computation.

Back to Lecture 11