Lecture 11: Bayesian Networks & Probabilistic Inference

Lecture Overview

Building on probability fundamentals from Lecture 10, we now tackle the central challenge of probabilistic AI: How do we represent and reason with complex probability distributions over many variables? Bayesian Networks provide the answer through elegant graph structures that encode conditional independence, enabling both compact representation and efficient inference algorithms.

Key Concepts: DAG Structure, Conditional Independence, d-Separation, Variable Elimination, Belief Propagation

Applications: Medical Expert Systems, Robot Localization, Causal Reasoning, Fault Diagnosis, NLP

The Fundamental Challenge

Consider a domain with n binary variables. A full joint probability distribution requires:

O(2ⁿ) parameters

For just 30 variables, that's over 1 billion probabilities to specify and store!

The Problem:

Exponential storage requirements
Exponential computation for inference
Impossible to learn from data
Not how humans reason

The Solution:

Exploit conditional independence
Use graph structure to factorize joint
Local conditional probability tables (CPTs)
Efficient inference algorithms

"Bayesian Networks are to probabilistic reasoning what propositional logic is to deterministic reasoning: a compact, modular representation that mirrors how humans naturally decompose complex problems."

📊 Lecture 11 Presentation Slides

Complete lecture presentation (32 slides) covering Bayesian Networks, conditional independence, d-separation, inference algorithms (enumeration & variable elimination). Perfect for reviewing lecture content!

Download PDF

Main Topics

1. Introduction to Bayesian Networks

Formal definition of Bayesian Networks as directed acyclic graphs (DAGs). Understanding nodes, edges, and how graph structure encodes probabilistic dependencies.

DAG Structure Nodes & Edges Formal Definition CPT Tables

2. Joint Distribution Factorization

How Bayesian Networks compactly represent the full joint distribution through factorization. The chain rule of probability and factorization according to graph structure.

Chain Rule Factorization Compact Representation Examples

3. Conditional Independence in BNs

The foundational concept enabling compact representation. How graph structure directly encodes conditional independence relationships between variables.

Independence Markov Blanket Local Semantics Graph Properties

4. d-Separation: Reading Independence

Algorithmic method for determining conditional independence from graph structure alone. Three canonical structures: chains, forks, and colliders (v-structures).

Chains Forks Colliders Path Blocking

5. Constructing Bayesian Networks

Practical guidelines for building Bayesian Networks from domain knowledge. Choosing variable ordering, identifying causal relationships, and specifying CPTs.

Domain Modeling Variable Ordering Causal Models CPT Elicitation

6. Exact Inference: Enumeration

The basic approach to probabilistic inference: summing over all possible worlds. Computing posterior probabilities given evidence through exhaustive enumeration.

Inference by Enumeration Marginalization Complexity Analysis Worked Examples

7. Variable Elimination Algorithm

The most important exact inference algorithm. Efficient computation through dynamic programming, factor operations, and smart elimination ordering.

VE Algorithm Factor Operations Elimination Ordering Complexity

Not Included

8. Belief Propagation on Trees

Pearl's message-passing algorithm for exact inference on tree-structured networks. Understanding message flow, convergence, and computational efficiency.

Message Passing Tree Networks Pearl's Algorithm Linear Time

Not Included

9. Computational Complexity of Inference

Understanding when exact inference is tractable and when it becomes intractable. NP-hardness results, treewidth, and the need for approximate methods.

NP-Hardness Treewidth Complexity Classes Tractability

Not Included

10. Approximate Inference: Sampling Methods

When exact inference is too expensive, use sampling. Forward sampling, rejection sampling, likelihood weighting, and their convergence properties.

Forward Sampling Rejection Sampling Likelihood Weighting Convergence

Not Included

11. MCMC & Gibbs Sampling

Advanced sampling techniques using Markov chains. Understanding Gibbs sampling, mixing time, burn-in, and why MCMC is the workhorse of modern probabilistic inference.

Markov Chains Gibbs Sampling Mixing & Burn-in Practical MCMC

Not Included

12. Variational Inference (Advanced)

Stanford CS228-level topic: Casting inference as optimization. Mean-field approximations, KL divergence minimization, and connections to modern deep learning (VAEs).

Optimization KL Divergence Mean Field VAEs