Back to Lecture 10

Bayes' Rule & Conditional Probability

The Most Important Formula in Probabilistic AI

Why Bayes' Rule is the Foundation of AI

🎯
Bayes' Theorem
P(A | B) =
P(B | A) × P(A)
P(B)

"Invert conditional probabilities"

The Four Components of Bayes' Rule
πŸ”΄
P(A|B)

POSTERIOR

What we WANT
P(Disease|+Test)

🟣
P(B|A)

LIKELIHOOD

What we KNOW
P(+Test|Disease)

πŸ”΅
P(A)

PRIOR

Base Rate
P(Disease)=1%

🟒
P(B)

EVIDENCE

Normalizer
P(+Test) Total

Bayesian Inference Flow:

πŸ”΅ Prior β†’ 🟣 Likelihood Γ— Prior β†’ 🟒 Γ· Evidence β†’ πŸ”΄ Posterior
Worked Examples: Applying Each Term
Example 1: Medical Diagnosis

Scenario: Testing for a rare disease

πŸ”΅

Prior

P(D) = 0.01

1% have disease


Absolute probability
BEFORE seeing test

🟣

Likelihood

P(+|D) = 0.95

95% sensitivity


Test accuracy
IF disease present

🟒

Evidence

P(+) = 0.059

5.9% test positive


Overall positive rate
(all causes)

πŸ”΄

Posterior

P(D|+) = 0.161

16.1% chance!


Updated belief
AFTER seeing test

⚠️ Key Insight: Despite 95% accurate test, only 16.1% chance of disease because base rate is low (1%)!
Calculation: (0.95 Γ— 0.01) / 0.059 = 0.161

πŸ“– Understanding the Bayesian Reasoning:

πŸ”΅ Prior (1%): Before any test, we know only 1 in 100 people have this disease (the base rate in the population).
🟣 Likelihood (95%): We know the test correctly identifies 95% of sick patients (but what about healthy ones?).
🟒 Evidence (5.9%): Overall, 5.9% of all people test positive (mix of true positives from 1% sick + false positives from 99% healthy).
πŸ”΄ Posterior (16.1%): After testing positive, we update our belief to 16.1% - much higher than 1%, but still mostly likely a false positive!

Example 2: Weather Prediction in Riyadh

Scenario: Sky is cloudy. What's probability of rain?

πŸ”΅

Prior

P(R) = 0.10

10% of days rain


Absolute probability
BEFORE seeing sky

🟣

Likelihood

P(C|R) = 0.80

80% rainy days cloudy


Cloud probability
IF raining

🟒

Evidence

P(C) = 0.35

35% of days cloudy


Overall cloudy rate
(observed now)

πŸ”΄

Posterior

P(R|C) = 0.229

22.9% chance rain


Updated belief
AFTER seeing clouds

☁️ Key Insight: Cloudy sky increases rain probability from 10% (base rate) to 22.9%!
Calculation: (0.80 Γ— 0.10) / 0.35 = 0.229

πŸ“– Understanding the Bayesian Reasoning:

πŸ”΅ Prior (10%): In Riyadh, historically 10% of all days have rain (the base rate from past data).
🟣 Likelihood (80%): We know that when it rains, the sky is cloudy 80% of the time (clouds are common with rain).
🟒 Evidence (35%): Overall, 35% of all days are cloudy (whether it rains or not - clouds happen for many reasons).
πŸ”΄ Posterior (22.9%): Now that we see clouds, we update our belief: rain is more likely than the 10% base rate, but still not probable!

Why This Matters for AI

Bayes' rule is the most important formula in probabilistic AI. It allows us to:

  • πŸ”„ Update beliefs with new evidence (Bayesian inference)
  • ⚑ Invert probabilities (know cause from effect)
  • πŸ€– Build intelligent systems that learn from data
  • 🎯 Make optimal decisions under uncertainty

Foundation: Conditional Probability

What is P(A|B)?

Conditional probability is the probability of A happening, given that B has already happened.

P(A | B) =
P(AB)
P(B)

Read as: "Probability of A given B"

Where Do These Probabilities Come From?

Not from axioms - from DATA! Unlike the three axioms of probability (which are assumptions), specific probabilities like P(Disease)=1% or P(Cloudy|Rain)=80% come from observations: historical records, clinical studies, weather data, or expert knowledge. Bayes' rule then combines these observed probabilities to make inferences.

Visual: Understanding Conditional Probability

Example: Weather in Riyadh

Data Source: Table values from 365 days of weather observations in Riyadh. These are empirical probabilities (observed frequencies), not theoretical.

Joint Probability Table
β˜€οΈ Sunny ☁️ Cloudy Total
🌬️ Windy 0.15 0.25 0.40
😌 Calm 0.35 0.25 0.60
Total 0.50 0.50 1.00
Conditional Probabilities

P(Windy | Sunny) = ?

Formula: P(W|S) =
P(W∩S)
P(S)

Step 1: Find P(Windy AND Sunny) from table = 0.15

Step 2: Find P(Sunny) from total column = 0.50

Step 3: Apply formula:

P(Windy|Sunny) =
0.15
0.50
= 0.30

"30% of sunny days are windy"

P(Sunny | Windy) = ?

Formula: P(S|W) =
P(S∩W)
P(W)

Step 1: Find P(Sunny AND Windy) from table = 0.15

Step 2: Find P(Windy) from total row = 0.40

Step 3: Apply formula:

P(Sunny|Windy) =
0.15
0.40
= 0.375

"37.5% of windy days are sunny"

Key Point

P(A|B) β‰  P(B|A) in general! Knowing B occurred changes our belief about A, but the degree of change is different in each direction. This is why we need Bayes' rule.

Independence vs Dependence

What Does Independence Mean?

Two events A and B are independent if knowing that one event occurred does not change the probability of the other event.

P(A | B) = P(A)

"Learning B happened doesn't affect probability of A"

Independent vs Dependent Events
Independent Events

Definition:

P(A|B) = P(A)

Examples:

  • 🎲 Two dice rolls
  • πŸͺ™ Coin flips
  • 🌧️ Rain in Riyadh vs Tokyo

Key Property:

Events don't influence each other!

Dependent Events

Definition:

P(A|B) β‰  P(A)

Examples:

  • ☁️ Cloudy sky β†’ Rain
  • πŸ”₯ Smoke β†’ Fire
  • 🩺 Positive test β†’ Disease

Key Property:

Events influence each other!

How Independence Affects Probability Calculations
1️⃣ Effect on Intersection: P(A ∩ B)

βœ… If INDEPENDENT:

P(AB) = P(A) Γ— P(B)

Example: P(Heads on coin 1 AND Heads on coin 2)
= 0.5 Γ— 0.5 = 0.25

❌ If DEPENDENT:

P(AB) = P(A) Γ— P(B|A)

Example: P(Cloudy AND Rain)
β‰  P(Cloudy) Γ— P(Rain)
Must use P(Rain|Cloudy)!

2️⃣ Effect on Conditional Probability: P(A|B)

βœ… If INDEPENDENT:

P(A | B) = P(A)

Example: P(Heads on coin 2 | Heads on coin 1)
= P(Heads on coin 2) = 0.5
Coin 1 doesn't affect coin 2!

❌ If DEPENDENT:

P(A | B) β‰  P(A)

Example: P(Rain | Cloudy) β‰  P(Rain)
If cloudy, rain is more likely!
Clouds affect rain probability!

3️⃣ Effect on Union: P(A βˆͺ B)

General Formula (always true):

P(A βˆͺ B) = P(A) + P(B) - P(A ∩ B)

βœ… If INDEPENDENT:

P(A βˆͺ B) = P(A) + P(B) - P(A)Γ—P(B)

Example: P(Heads on coin 1 OR Heads on coin 2)
= 0.5 + 0.5 - (0.5Γ—0.5) = 0.75

❌ If DEPENDENT:

Use general formula
with actual P(A ∩ B)

Example: P(Cloudy OR Rain)
Must find P(Cloudy ∩ Rain) from data,
cannot use P(Cloudy)Γ—P(Rain)!

Interactive: Testing Independence

Select a scenario and see if events are independent by comparing P(A|B) with P(A).

Why This Matters for AI

Bayesian Networks and Probabilistic Graphical Models explicitly model dependencies between variables. Understanding independence:

  • Simplifies calculations: Independent events are much easier to compute
  • Reduces model complexity: Fewer parameters needed for independent variables
  • Enables inference: Conditional independence allows efficient reasoning in large networks
  • Real-world impact: Most AI systems assume certain independences (e.g., Naive Bayes classifier)

Deriving Bayes' Rule (Step-by-Step)

The Goal

We want to flip a conditional probability: given P(B|A), find P(A|B). This is like converting "If it rains, there are clouds" into "If there are clouds, will it rain?"

Algebraic Derivation

Follow the mathematical steps from conditional probability to Bayes' Rule

Step 1: Definition of Conditional Probability

By definition, the probability of A given B is:

P(A|B) =
P(A ∩ B)
P(B)

"What fraction of times B happens does A also happen?"

Step 2: Reverse Direction (Symmetry of Intersection)

We can also write the conditional probability in reverse:

P(B|A) =
P(A ∩ B)
P(A)

"A ∩ B is the same as B ∩ A (intersection is symmetric)"

Step 3: Solve for the Intersection

Multiply both sides of Step 2 by P(A):

P(A ∩ B) = P(B|A) Γ— P(A)

"This is the multiplication rule for dependent events!"

Step 4: Substitute into Step 1 β†’ BAYES' RULE!

Replace P(A ∩ B) in Step 1 with the expression from Step 3:

From Step 1:

P(A|B) =
P(A ∩ B)
P(B)
Replace with Step 3
P(A|B) =
P(B|A) Γ— P(A)
P(B)

πŸŽ‰ This is Bayes' Rule!
We've successfully inverted the conditional probability!

Concrete Example: Medical Diagnosis

Let's apply each step with actual values

What We Know

Given information:

  • 🟣 P(+Test|Disease) = 0.95
    Test sensitivity
  • πŸ”΅ P(Disease) = 0.01
    Base rate
  • 🟒 P(+Test) = 0.059
    Overall positive rate
What We Want

Goal:

Find: P(Disease|+Test) = ?

If test is positive, what's the probability of actually having the disease?

Apply Bayes' Rule:

P(D|+) =
0.95 Γ— 0.01
0.059
=
0.0095
0.059
= 0.161

Result: Only 16.1% chance of disease despite positive test!

The Power of Bayes' Rule

Bayes' rule lets us invert conditional probabilities:

βœ… What we can easily observe:

  • P(symptom|disease)
  • P(test result|condition)
  • P(effect|cause)

🎯 What we actually need:

  • P(disease|symptom)
  • P(condition|test result)
  • P(cause|effect)

This "probability inversion" is fundamental to diagnosis, prediction, machine learning, and AI reasoning!

Understanding the Components

Posterior = Likelihood × PriorEvidence
πŸ”΅
Prior: P(A)

What we believed before seeing evidence B

🟣
Likelihood: P(B|A)

How likely is evidence B if A is true?

🟒
Evidence: P(B)

Total probability of seeing evidence B

πŸ”΄
Posterior: P(A|B)

Updated belief after seeing evidence B

Classic Example: Medical Diagnosis

The Medical Testing Paradox

A patient tests positive for a disease. What's the probability they actually have it? The answer is often counterintuitive - Bayes' rule reveals the truth!

Interactive: Disease Testing Calculator

Adjust the sliders to see how base rates dramatically affect diagnosis!

1.0%
95%
5%
Bayes' Rule Calculation
Step 1: Calculate P(+)

Law of Total Probability

P(+) = P(+|D)Γ—P(D) + P(+|Β¬D)Γ—P(Β¬D)
Intuitive Meaning:
"What percentage of ALL people test positive?"
This combines two groups: sick people who test positive (true positives) + healthy people who test positive (false positives).
Step 2: Apply Bayes' Rule

Posterior Probability

P(D|+) = [P(+|D) Γ— P(D)] / P(+)
Intuitive Meaning:
"Given that I tested positive, what's the chance I actually have the disease?"
This flips the perspective: from "test accuracy" to "disease probability after testing."
If test is positive:
β€”

Adjust sliders to calculate

The Counterintuitive Result

Even with 95% accurate test, if disease is rare (1%), positive test only means ~16% chance of disease!
This is because false positives from the 99% healthy population outnumber true positives from the 1% sick population. Base rates matter! This is why Bayes' rule is essential.

Classic Example: Burglary-Alarm Network

πŸ›‘οΈ The Burglary-Alarm Mystery

It's 2 AM. Your home alarm blares! 🚨
Your heart races - is it a burglar πŸ¦Ήβ€β™‚οΈ or just an earthquake 🌍 shaking the house?
Bayes' rule helps you figure out what really caused that alarm!

Why This Matters

Real alarms don't just say "burglar!" - they can be triggered by many causes.
Bayes' rule lets us update our beliefs about multiple possible causes when we get evidence.

πŸ” Interactive: What's Really Causing My Alarm?
🦹
Burglar 1% chance
🌍
Earthquake 0.1% chance
⬇️
Triggers
πŸ””
ALARM SOUNDS!
Your evidence
Before Alarm: Base Rates
0.1% 1.0% 5%
0.01% 0.1% 1%
Alarm Sensitivity
90% 99.9% 100%
50% 70% 95%
Step 1: P(Alarm) - Total Probability

What % of time does alarm go off?

P(A) = P(A|B)Γ—P(B) + P(A|E)Γ—P(E)
Calculating...
Step 2: What Really Caused It?

Updated beliefs after hearing alarm

P(B|A) = [P(A|B)Γ—P(B)] / P(A)
P(E|A) = [P(A|E)Γ—P(E)] / P(A)
Calculating...
πŸ¦Ήβ€β™‚οΈ Burglar Caused Alarm?
β€”

Adjust sliders to see

🌍 Earthquake Caused Alarm?
β€”

Adjust sliders to see

πŸ”‘ Key Insights

Multiple Causes:

Alarms don't just detect burglars - they respond to any trigger. Bayes helps us disentangle competing explanations.

Evidence Updates:

The alarm is evidence that shifts our beliefs, but doesn't give perfect certainty about what caused it.

Key Takeaways

Bayes' Rule Essentials
  1. Formula: P(A|B) = P(B|A)Γ—P(A) / P(B)
  2. Purpose: Invert conditional probabilities
  3. Components: Prior, Likelihood, Evidence, Posterior
  4. Base Rates Matter: P(A) can dominate the result
  5. Update Beliefs: Posterior becomes new prior
Why This Powers AI
  • Foundation of Bayesian networks
  • Enables learning from data
  • Handles uncertainty in reasoning
  • Used in spam filters, medical AI, robotics
  • The "learning rule" for probabilistic AI
The Most Important Formula in AI

"Bayes' rule is the mathematical foundation for how AI systems update beliefs with evidence. Every time a spam filter learns, a medical AI diagnoses, or a robot localizes itself, Bayes' rule is working behind the scenes."

Next: Now you understand Bayes' rule! Let's explore probability distributions and joint/marginal probabilities. Continue to Topic 6 β†’