Probability & Uncertainty Cheat Sheet

Quick Reference Guide for SE444 Lecture 10

Back to Lecture 10

1. Probability Axioms (Kolmogorov)

Axiom 1: Non-negativity
\[ 0 \leq P(A) \leq 1 \text{ for all events } A \]
Axiom 2: Certainty
\[ P(\Omega) = 1 \text{ (probability of sample space)} \] \[ P(\text{true}) = 1, \quad P(\text{false}) = 0 \]
Axiom 3: Additivity
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \] \[ \text{If } A \cap B = \emptyset: \quad P(A \cup B) = P(A) + P(B) \]
Derived Rule: \( P(\neg A) = 1 - P(A) \)

2. Conditional Probability

Definition
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \quad \text{if } P(B) > 0 \]
Read as: "Probability of A given B"
Product Rule (Chain Rule)
\[ P(A \cap B) = P(A \mid B) \cdot P(B) = P(B \mid A) \cdot P(A) \]
General Chain Rule
\[ P(A_1, A_2, \ldots, A_n) = P(A_1) \cdot P(A_2 \mid A_1) \cdot P(A_3 \mid A_1, A_2) \cdots P(A_n \mid A_1, \ldots, A_{n-1}) \]
Example: P(cavity ∧ toothache) = P(cavity | toothache) × P(toothache)

3. Bayes' Rule ⭐ (Most Important)

Bayes' Theorem (Standard Form)
\[ P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)} \]
Bayes' Theorem (Expanded Form)
\[ P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B \mid A) \cdot P(A) + P(B \mid \neg A) \cdot P(\neg A)} \]
Bayesian Inference Form
\[ P(\text{hypothesis} \mid \text{evidence}) = \frac{P(\text{evidence} \mid \text{hypothesis}) \cdot P(\text{hypothesis})}{P(\text{evidence})} \] \[ \text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Evidence}} \]
Key Insight: Bayes' rule allows us to invert conditional probabilities. If we know P(symptoms | disease), we can compute P(disease | symptoms) using prior knowledge P(disease).
Medical Diagnosis Example:
Given: P(positive test | disease) = 0.99, P(disease) = 0.001, P(positive test | no disease) = 0.05
Find: P(disease | positive test)

\[ P(D \mid +) = \frac{0.99 \times 0.001}{0.99 \times 0.001 + 0.05 \times 0.999} = \frac{0.00099}{0.05094} \approx 0.019 \]
Interpretation: Only 1.9% chance of disease despite positive test! (due to low base rate)

4. Marginalization (Summing Out)

Marginalization Formula
\[ P(A) = \sum_{b \in B} P(A, b) = \sum_{b} P(A \mid b) \cdot P(b) \]
Sum over all possible values of B to eliminate it
Example: Given joint P(Weather, Traffic), compute P(Weather):
P(sunny) = P(sunny, light) + P(sunny, heavy)
P(rainy) = P(rainy, light) + P(rainy, heavy)

5. Independence & Conditional Independence

Independence
\[ A \perp B \iff P(A \cap B) = P(A) \cdot P(B) \] \[ \text{Equivalently: } P(A \mid B) = P(A) \]
Conditional Independence (⭐ Very Important)
\[ A \perp B \mid C \iff P(A \cap B \mid C) = P(A \mid C) \cdot P(B \mid C) \] \[ \text{Equivalently: } P(A \mid B, C) = P(A \mid C) \]
"A and B are independent given C"
Why it matters: Independence reduces parameters exponentially!
• Without independence: n binary variables need \(2^n - 1\) parameters
• With independence: only n parameters
• Conditional independence: enables efficient inference in Bayesian networks
Example: Toothache and Catch are conditionally independent given Cavity:
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) × P(Catch | Cavity)

6. Probability Distributions

Joint Distribution
\[ P(X_1, X_2, \ldots, X_n) \]
Complete probability model: specifies probability of every possible state
Marginal Distribution
\[ P(X) = \sum_{y} P(X, y) \]
Probability of subset of variables (others summed out)
Conditional Distribution
\[ P(X \mid Y) = \frac{P(X, Y)}{P(Y)} \]
Distribution over X given fixed value of Y
Example: Joint Probability Table
Weather Traffic P(W, T)
sunny light 0.3
sunny heavy 0.1
rainy light 0.2
rainy heavy 0.4
Marginal: P(sunny) = 0.3 + 0.1 = 0.4
Conditional: P(heavy | rainy) = 0.4 / (0.2 + 0.4) = 0.67

7. Law of Total Probability

Law of Total Probability
\[ P(A) = \sum_{i} P(A \mid B_i) \cdot P(B_i) \]
Where \(B_1, B_2, \ldots, B_n\) partition the sample space
Example: P(alarm) = P(alarm | burglary) × P(burglary) + P(alarm | no burglary) × P(no burglary)

8. Common Bayesian Inference Patterns

Pattern 1: Medical Diagnosis
\[ P(\text{disease} \mid \text{symptoms}) = \frac{P(\text{symptoms} \mid \text{disease}) \cdot P(\text{disease})}{P(\text{symptoms})} \]
Given: Sensitivity P(+ | disease), Specificity P(- | no disease), Base rate P(disease)
Find: P(disease | +)
Pattern 2: Naive Bayes Classifier
\[ P(\text{class} \mid x_1, \ldots, x_n) \propto P(\text{class}) \prod_{i=1}^{n} P(x_i \mid \text{class}) \]
Assumes features are conditionally independent given class
Pattern 3: Sequential Bayesian Update
\[ P(H \mid e_1, e_2) = \frac{P(e_2 \mid H, e_1) \cdot P(H \mid e_1)}{P(e_2 \mid e_1)} \]
Update posterior with new evidence (posterior becomes new prior)

9. Key Statistical Theorems

Law of Large Numbers (LLN)
\[ \lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} X_i = \mathbb{E}[X] \]
Sample average converges to expected value as n increases
Central Limit Theorem (CLT)
\[ \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \]
Sum of independent random variables approaches normal distribution
Why these matter: • LLN: Justifies using sample statistics to estimate population parameters
• CLT: Explains why normal distribution appears everywhere in nature
• Both are foundations of machine learning and statistical inference

10. Python Implementation

Basic Probability Calculations
import numpy as np

# Joint probability table
joint = np.array([[0.3, 0.1], [0.2, 0.4]])

# Marginalization (sum over axis)
p_weather = joint.sum(axis=1)  # [0.4, 0.6]
p_traffic = joint.sum(axis=0)  # [0.5, 0.5]

# Conditional probability
p_traffic_given_rainy = joint[1, :] / joint[1, :].sum()

# Bayes' rule
def bayes_rule(likelihood, prior, evidence):
    return (likelihood * prior) / evidence
Bayesian Inference Example
# Medical diagnosis
p_disease = 0.001  # prior
sensitivity = 0.99  # P(+ | disease)
specificity = 0.95  # P(- | no disease)

# P(+)
p_positive = sensitivity * p_disease + (1 - specificity) * (1 - p_disease)

# P(disease | +)
posterior = (sensitivity * p_disease) / p_positive
print(f"P(disease | positive test) = {posterior:.4f}")

11. Logic vs Probability Comparison

Aspect Formal Logic Probability
Knowledge Complete, certain Incomplete, uncertain
Truth Values True / False (binary) 0 to 1 (degrees of belief)
Inference Deduction (certain conclusions) Induction (probable conclusions)
Contradictions System fails Handled gracefully
Real World Struggles with noise, incompleteness Natural fit for uncertain data
Examples Theorem provers, expert systems ML models, Bayesian networks
Key Philosophy: "Intelligence is not about absolute certainty, but about reasoning optimally under uncertainty." Logic and probability are complementary, not competing approaches.

12. Common Mistakes to Avoid

❌ Confusion of the Inverse:
P(A | B) ≠ P(B | A) in general
Example: P(spots | measles) ≠ P(measles | spots)
❌ Base Rate Neglect:
Ignoring P(disease) when computing P(disease | symptoms)
Result: Overestimating rare diseases
❌ Assuming Independence:
P(A, B) = P(A) × P(B) only if A and B are independent
Must verify independence, not assume it
❌ Unnormalized Probabilities:
Probabilities must sum to 1
Always normalize: P(A | B) = P(A, B) / P(B)
❌ Conditional Independence Confusion:
A ⊥ B doesn't imply A ⊥ B | C
A ⊥ B | C doesn't imply A ⊥ B
Must check each separately
SE444: Artificial Intelligence | Lecture 10 Cheat Sheet: Uncertainty & Probability
Print this page for quick reference during studying and exams