Variable Elimination

The Big Idea

Variable Elimination improves on enumeration by avoiding redundant computation. Instead of summing the same sub-expressions multiple times, we compute once and reuse!

🌟 The Best Intuition: VE = Cleaning Up Before You Calculate

Enumeration

Doing a huge messy calculation as-is

Simplifying the expression before calculating

Just like in high school algebra!

Enumeration Approach

Expand Everything First

(a + b)(c + d)(e + f)

↓

= ace + acf + ade + adf
+ bce + bcf + bde + bdf

8 terms to compute!
Lots of repeated work

VE Approach

Simplify & Eliminate First

(a + b)(c + d)(e + f)

↓

Step 1: Let x = (e + f)
Step 2: Let y = (c + d)x
Step 3: Result = (a + b)y

Compute once, reuse!
Much more efficient

Key Insight

VE is literally the "factor & simplify" trick from high school algebra, but applied to probabilities. Instead of expanding everything and doing redundant work, we eliminate unneeded variables and reuse intermediate results.

1. From Enumeration to Variable Elimination

The Redundancy Problem in Enumeration

Example Query:

Compute P(B | A) in a simple network: A → B → C → D

P(B | A) = α · P(B, A)

where α is the normalization constant

Enumeration Approach (Wasteful!)

Enumerate over all hidden variables (C and D):

P(B,A) = P(A,B,C=T,D=T) + P(A,B,C=T,D=F)
+ P(A,B,C=F,D=T) + P(A,B,C=F,D=F)

Problem: The highlighted terms both compute P(A,B,C=T) and then multiply by different D values! This is redundant — we're computing the same sub-expression multiple times.

The Key Transformation

Step 1

Enumeration: Expand Everything

P(B,A) = Σ_C,D P(A,B,C,D)

Sum over all combinations of C and D

Step 2

Factor by Chain Rule

= Σ_C,D P(A)·P(B|A)·P(C|B)·P(D|C)

Break joint into conditional probabilities

Step 3

Push Summations Inside

= P(A)·P(B|A)·Σ_C P(C|B)·[Σ_D P(D|C)]

Group terms that don't depend on outer variables

Step 4

Result: Product of Factors

= f₁ × f₂ × f₃

Where:
f₁ = P(A) · P(B|A)
f₂ = Σ_D P(D|C) (after eliminating D)
f₃ = Σ_C P(C|B) · f₂ (after eliminating C)

Each factor is computed once and reused

Side-by-Side Comparison

Enumeration

P(B,A) = P(A,B,C=T,D=T)
         + P(A,B,C=T,D=F)
         + P(A,B,C=F,D=T)
         + P(A,B,C=F,D=F)

Computation: 4 full joint probabilities
Redundancy: Computes P(A,B,C) twice!

Variable Elimination

P(B,A) = P(A,B,C=T) × [P(D=T) + P(D=F)]
+ P(A,B,C=F) × [P(D=T) + P(D=F)]

Computation: Eliminate D first (Σ_D = 1)
Efficiency: Computes P(A,B,C) once!

The Big Insight

Enumeration computes the same intermediate results many times because it sums over all variables simultaneously.

Variable Elimination computes each intermediate result exactly once by:

Rearranging summations to eliminate variables one at a time
Storing intermediate factors and reusing them
Dynamic programming on the computational graph

2. Key Concepts: Factors

What is a Factor?

A factor is just a multi-dimensional table of numbers. It can represent:

A CPT: P(B|A)
A joint probability: P(A,B,C)
Any function of variables: f(X,Y,Z)

Two Simple Operations:

1. Factor Product (×)

Multiply factors to combine variables

f₁(A,B) × f₂(B,C) = f₃(A,B,C)

Entry-wise multiplication for matching values of B

2. Factor Marginalization (Σ)

Sum out a variable to eliminate it

Σ_B f(A,B,C) = f′(A,C)

Sum over all values of B

3. The Variable Elimination Algorithm

Goal

Compute P(Q | e) efficiently by eliminating variables one at a time

1

Initialize Factors

What to do:

Create one factor for each CPT in the network

Example: For network A → B → C
• f_A = P(A)
• f_B = P(B|A)
• f_C = P(C|B)

Why:

Each factor represents a piece of the joint probability. We'll combine them strategically to avoid redundant computation.

2

Eliminate Hidden Variables (One at a Time)

What to do:

For each hidden variable X:

Multiply all factors that mention X
Sum out X from the product
Store the result as a new factor

Why:

By eliminating variables one at a time, we create smaller intermediate results that we can reuse, avoiding the exponential blowup of enumeration.

Mini Example: Eliminating C

Before:

f₁(B,C)
f₂(C,D)

After eliminating C:

f_new(B,D) = Σ_C [f₁ × f₂]

3

Multiply & Normalize

What to do:

Multiply all remaining factors
Normalize to get probability distribution

P(Q|e) = α · [f₁ × f₂ × ...]

where α ensures sum = 1

Why:

After eliminating all hidden variables, we're left with factors only over the query and evidence. Normalizing gives us the final conditional probability.

The Key Pattern

Initialize → Eliminate X₁ → Eliminate X₂ → ... → Normalize

Each elimination step reduces the problem size!

4. Complete Example

Query: P(Flu | Fever=Yes)

A simple 3-node network: Flu → Fever, Flu → Cough

Network

graph TD
    F[Flu]
    Fe[Fever]
    C[Cough]
    
    F --> Fe
    F --> C
    
    style F fill:#dc3545,stroke:#0a2540,stroke-width:3px,color:#fff
    style Fe fill:#32D583,stroke:#0a2540,stroke-width:3px,color:#fff
    style C fill:#ffc107,stroke:#0a2540,stroke-width:3px,color:#000

Given CPTs

Flu	P
Yes	0.1
No	0.9

P(Fever\|Flu)
F=Y	0.8
F=N	0.2

P(Cough\|Flu)
F=Y then C=Y	0.7
F=N then C=Y	0.3

Step 1: Initialize Factors & Set Evidence

Create factors and instantiate evidence (Fever=Yes)

f₁ = P(Flu)

Values:

f₁(Flu=Yes) = 0.1
f₁(Flu=No) = 0.9

f₂ = P(Fever=Yes|Flu)

Evidence set!

Values:

f₂(Flu=Yes) = 0.8
f₂(Flu=No) = 0.2

f₃ = P(Cough|Flu)

Values:

f₃(Flu=Yes, Cough=Yes) = 0.7
f₃(Flu=No, Cough=Yes) = 0.3

Note: We set Fever=Yes in factor f₂, so it becomes a function only of Flu

Step 2: Eliminate Cough (Hidden Variable)

Only f₃ contains Cough, so we sum it out:

f₄(Flu) = Σ_Cough [f₃(Cough, Flu)]

Result: f₄(Flu) = 1 for all Flu values (probabilities sum to 1)

Step 3: Multiply Remaining Factors

Multiply all factors over Flu:

P(Flu, Fever=Yes) = f₁(Flu) × f₂(Flu) × f₄(Flu)

Flu=Yes:

P(Flu) × P(Fever=Yes|Flu=Yes) × 1
= 0.1 × 0.8 × 1 = 0.08

Flu=No:

P(Flu) × P(Fever=Yes|Flu=No) × 1
= 0.9 × 0.2 × 1 = 0.18

Step 4: Normalize

P(Flu=Yes | Fever=Yes) = 0.08 / (0.08 + 0.18) =

0.308 (30.8%)

5. Why Variable Elimination Wins

Enumeration

Complexity: O(2ⁿ) — exponential in ALL variables
Repeats same calculations many times
Simple but extremely inefficient

Variable Elimination

Complexity: O(2^w) — exponential in tree-width only
Computes each sub-expression exactly once
Much more efficient in practice!

Key Insight

The order in which you eliminate variables matters! Smart ordering can make the difference between tractable and intractable computation.