Lecture 11: Bayesian Networks & Probabilistic Inference

Compact Representations and Efficient Reasoning Under Uncertainty

From exponential joint distributions to elegant graphical models
Back to Course Overview

Lecture Overview

Building on probability fundamentals from Lecture 10, we now tackle the central challenge of probabilistic AI: How do we represent and reason with complex probability distributions over many variables? Bayesian Networks provide the answer through elegant graph structures that encode conditional independence, enabling both compact representation and efficient inference algorithms.

Key Concepts: DAG Structure, Conditional Independence, d-Separation, Variable Elimination, Belief Propagation
Applications: Medical Expert Systems, Robot Localization, Causal Reasoning, Fault Diagnosis, NLP

The Fundamental Challenge

Consider a domain with n binary variables. A full joint probability distribution requires:

O(2n) parameters

For just 30 variables, that's over 1 billion probabilities to specify and store!

The Problem:
  • Exponential storage requirements
  • Exponential computation for inference
  • Impossible to learn from data
  • Not how humans reason
The Solution:
  • Exploit conditional independence
  • Use graph structure to factorize joint
  • Local conditional probability tables (CPTs)
  • Efficient inference algorithms

"Bayesian Networks are to probabilistic reasoning what propositional logic is to deterministic reasoning: a compact, modular representation that mirrors how humans naturally decompose complex problems."

📊 Lecture 11 Presentation Slides

Complete lecture presentation (32 slides) covering Bayesian Networks, conditional independence, d-separation, inference algorithms (enumeration & variable elimination). Perfect for reviewing lecture content!

Main Topics

Not Included
8. Belief Propagation on Trees

Pearl's message-passing algorithm for exact inference on tree-structured networks. Understanding message flow, convergence, and computational efficiency.

Message Passing Tree Networks Pearl's Algorithm Linear Time
Not Included
9. Computational Complexity of Inference

Understanding when exact inference is tractable and when it becomes intractable. NP-hardness results, treewidth, and the need for approximate methods.

NP-Hardness Treewidth Complexity Classes Tractability
Not Included
10. Approximate Inference: Sampling Methods

When exact inference is too expensive, use sampling. Forward sampling, rejection sampling, likelihood weighting, and their convergence properties.

Forward Sampling Rejection Sampling Likelihood Weighting Convergence
Not Included
11. MCMC & Gibbs Sampling

Advanced sampling techniques using Markov chains. Understanding Gibbs sampling, mixing time, burn-in, and why MCMC is the workhorse of modern probabilistic inference.

Markov Chains Gibbs Sampling Mixing & Burn-in Practical MCMC
Not Included
12. Variational Inference (Advanced)

Stanford CS228-level topic: Casting inference as optimization. Mean-field approximations, KL divergence minimization, and connections to modern deep learning (VAEs).

Optimization KL Divergence Mean Field VAEs

Interactive Demonstrations

Visualize Bayesian Networks, explore d-separation, and watch inference algorithms in action.

Practical Exercises

Master Bayesian Networks through hands-on problem-solving.

BN Construction Problems

Build Bayesian Networks from scratch for various domains.

d-Separation Problems

Determine conditional independence from graph structure.

Variable Elimination Practice

Step through VE algorithm by hand and implement in Python.

Sampling & MCMC Problems

Implement and analyze forward sampling, rejection sampling, and Gibbs sampling.

Key Concepts Summary

Bayesian Network Definition:
  • DAG (Directed Acyclic Graph) structure
  • Each node has a CPT P(X | Parents(X))
  • Joint factorizes: P(X₁...Xₙ) = ∏ P(Xᵢ | Parents(Xᵢ))
Conditional Independence:
  • Enables compact representation
  • Read from graph via d-separation
  • Chain, fork, collider patterns
Exact Inference:
  • Enumeration (exponential)
  • Variable Elimination (dynamic programming)
  • Belief Propagation (trees, linear time)
  • NP-hard in general
Approximate Inference:
  • Forward, rejection, likelihood weighting
  • MCMC & Gibbs sampling
  • Variational inference (optimization)
  • Essential for large networks
SE444: Artificial Intelligence | Lecture 11: Bayesian Networks & Probabilistic Inference