Bayes' Rule & Conditional Probability | Lecture 10

Why Bayes' Rule is the Foundation of AI

🎯

Bayes' Theorem

P(A | B) =

P(B | A) × P(A)

P(B)

"Invert conditional probabilities"

The Four Components of Bayes' Rule

🔴

P(A|B)

POSTERIOR

What we WANT
P(Disease|+Test)

🟣

P(B|A)

LIKELIHOOD

What we KNOW
P(+Test|Disease)

🔵

P(A)

PRIOR

Base Rate
P(Disease)=1%

🟢

P(B)

EVIDENCE

Normalizer
P(+Test) Total

Bayesian Inference Flow:

🔵 Prior → 🟣 Likelihood × Prior → 🟢 ÷ Evidence → 🔴 Posterior

Worked Examples: Applying Each Term

Example 1: Medical Diagnosis

Scenario: Testing for a rare disease

🔵

Prior

P(D) = 0.01

1% have disease

Absolute probability
BEFORE seeing test

🟣

Likelihood

P(+|D) = 0.95

95% sensitivity

Test accuracy
IF disease present

🟢

Evidence

P(+) = 0.059

5.9% test positive

Overall positive rate
(all causes)

🔴

Posterior

P(D|+) = 0.161

16.1% chance!

Updated belief
AFTER seeing test

⚠️ Key Insight: Despite 95% accurate test, only 16.1% chance of disease because base rate is low (1%)!
Calculation: (0.95 × 0.01) / 0.059 = 0.161

📖 Understanding the Bayesian Reasoning:

🔵 Prior (1%): Before any test, we know only 1 in 100 people have this disease (the base rate in the population).
🟣 Likelihood (95%): We know the test correctly identifies 95% of sick patients (but what about healthy ones?).
🟢 Evidence (5.9%): Overall, 5.9% of all people test positive (mix of true positives from 1% sick + false positives from 99% healthy).
🔴 Posterior (16.1%): After testing positive, we update our belief to 16.1% - much higher than 1%, but still mostly likely a false positive!

Example 2: Weather Prediction in Riyadh

Scenario: Sky is cloudy. What's probability of rain?

🔵

Prior

P(R) = 0.10

10% of days rain

Absolute probability
BEFORE seeing sky

🟣

Likelihood

P(C|R) = 0.80

80% rainy days cloudy

Cloud probability
IF raining

🟢

Evidence

P(C) = 0.35

35% of days cloudy

Overall cloudy rate
(observed now)

🔴

Posterior

P(R|C) = 0.229

22.9% chance rain

Updated belief
AFTER seeing clouds

☁️ Key Insight: Cloudy sky increases rain probability from 10% (base rate) to 22.9%!
Calculation: (0.80 × 0.10) / 0.35 = 0.229

📖 Understanding the Bayesian Reasoning:

🔵 Prior (10%): In Riyadh, historically 10% of all days have rain (the base rate from past data).
🟣 Likelihood (80%): We know that when it rains, the sky is cloudy 80% of the time (clouds are common with rain).
🟢 Evidence (35%): Overall, 35% of all days are cloudy (whether it rains or not - clouds happen for many reasons).
🔴 Posterior (22.9%): Now that we see clouds, we update our belief: rain is more likely than the 10% base rate, but still not probable!

Why This Matters for AI

Bayes' rule is the most important formula in probabilistic AI. It allows us to:

🔄 Update beliefs with new evidence (Bayesian inference)
⚡ Invert probabilities (know cause from effect)
🤖 Build intelligent systems that learn from data
🎯 Make optimal decisions under uncertainty

Foundation: Conditional Probability

What is P(A|B)?

Conditional probability is the probability of A happening, given that B has already happened.

P(A | B) =

P(A ∩ B)

P(B)

Read as: "Probability of A given B"

Where Do These Probabilities Come From?

Not from axioms - from DATA! Unlike the three axioms of probability (which are assumptions), specific probabilities like P(Disease)=1% or P(Cloudy|Rain)=80% come from observations: historical records, clinical studies, weather data, or expert knowledge. Bayes' rule then combines these observed probabilities to make inferences.

Visual: Understanding Conditional Probability

Example: Weather in Riyadh

Data Source: Table values from 365 days of weather observations in Riyadh. These are empirical probabilities (observed frequencies), not theoretical.

Joint Probability Table

	☀️ Sunny	☁️ Cloudy	Total
🌬️ Windy	0.15	0.25	0.40
😌 Calm	0.35	0.25	0.60
Total	0.50	0.50	1.00

Conditional Probabilities

P(Windy | Sunny) = ?

Formula: P(W|S) =

P(W∩S)

P(S)

Step 1: Find P(Windy AND Sunny) from table = 0.15

Step 2: Find P(Sunny) from total column = 0.50

Step 3: Apply formula:

P(Windy|Sunny) =

0.15

0.50

= 0.30

"30% of sunny days are windy"

P(Sunny | Windy) = ?

Formula: P(S|W) =

P(S∩W)

P(W)

Step 1: Find P(Sunny AND Windy) from table = 0.15

Step 2: Find P(Windy) from total row = 0.40

Step 3: Apply formula:

P(Sunny|Windy) =

0.15

0.40

= 0.375

"37.5% of windy days are sunny"

Key Point

P(A|B) ≠ P(B|A) in general! Knowing B occurred changes our belief about A, but the degree of change is different in each direction. This is why we need Bayes' rule.

Independence vs Dependence

What Does Independence Mean?

Two events A and B are independent if knowing that one event occurred does not change the probability of the other event.

P(A | B) = P(A)

"Learning B happened doesn't affect probability of A"

Independent vs Dependent Events

Independent Events

Definition:

P(A|B) = P(A)

Examples:

🎲 Two dice rolls
🪙 Coin flips
🌧️ Rain in Riyadh vs Tokyo

Key Property:

Events don't influence each other!

Dependent Events

Definition:

P(A|B) ≠ P(A)

Examples:

☁️ Cloudy sky → Rain
🔥 Smoke → Fire
🩺 Positive test → Disease

Key Property:

Events influence each other!

How Independence Affects Probability Calculations

1️⃣ Effect on Intersection: P(A ∩ B)

✅ If INDEPENDENT:

P(A ∩ B) = P(A) × P(B)

Example: P(Heads on coin 1 AND Heads on coin 2)
= 0.5 × 0.5 = 0.25

❌ If DEPENDENT:

P(A ∩ B) = P(A) × P(B|A)

Example: P(Cloudy AND Rain)
≠ P(Cloudy) × P(Rain)
Must use P(Rain|Cloudy)!

2️⃣ Effect on Conditional Probability: P(A|B)

✅ If INDEPENDENT:

P(A | B) = P(A)

Example: P(Heads on coin 2 | Heads on coin 1)
= P(Heads on coin 2) = 0.5
Coin 1 doesn't affect coin 2!

❌ If DEPENDENT:

P(A | B) ≠ P(A)

Example: P(Rain | Cloudy) ≠ P(Rain)
If cloudy, rain is more likely!
Clouds affect rain probability!

3️⃣ Effect on Union: P(A ∪ B)

General Formula (always true):

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

✅ If INDEPENDENT:

P(A ∪ B) = P(A) + P(B) - P(A)×P(B)

Example: P(Heads on coin 1 OR Heads on coin 2)
= 0.5 + 0.5 - (0.5×0.5) = 0.75

❌ If DEPENDENT:

Use general formula
with actual P(A ∩ B)

Example: P(Cloudy OR Rain)
Must find P(Cloudy ∩ Rain) from data,
cannot use P(Cloudy)×P(Rain)!

Interactive: Testing Independence

Select a scenario and see if events are independent by comparing P(A|B) with P(A).

Choose Scenario:

Why This Matters for AI

Bayesian Networks and Probabilistic Graphical Models explicitly model dependencies between variables. Understanding independence:

Simplifies calculations: Independent events are much easier to compute
Reduces model complexity: Fewer parameters needed for independent variables
Enables inference: Conditional independence allows efficient reasoning in large networks
Real-world impact: Most AI systems assume certain independences (e.g., Naive Bayes classifier)

Deriving Bayes' Rule (Step-by-Step)

The Goal

We want to flip a conditional probability: given P(B|A), find P(A|B). This is like converting "If it rains, there are clouds" into "If there are clouds, will it rain?"

Algebraic Derivation

Follow the mathematical steps from conditional probability to Bayes' Rule

Step 1: Definition of Conditional Probability

By definition, the probability of A given B is:

P(A|B) =

P(A ∩ B)

P(B)

"What fraction of times B happens does A also happen?"

Step 2: Reverse Direction (Symmetry of Intersection)

We can also write the conditional probability in reverse:

P(B|A) =

P(A ∩ B)

P(A)

"A ∩ B is the same as B ∩ A (intersection is symmetric)"

Step 3: Solve for the Intersection

Multiply both sides of Step 2 by P(A):

P(A ∩ B) = P(B|A) × P(A)

"This is the multiplication rule for dependent events!"

Step 4: Substitute into Step 1 → BAYES' RULE!

Replace P(A ∩ B) in Step 1 with the expression from Step 3:

From Step 1:

P(A|B) =

P(A ∩ B)

P(B)

Replace with Step 3

P(A|B) =

P(B|A) × P(A)

P(B)

🎉 This is Bayes' Rule!
We've successfully inverted the conditional probability!

Concrete Example: Medical Diagnosis

Let's apply each step with actual values

What We Know

Given information:

🟣 P(+Test|Disease) = 0.95
Test sensitivity
🔵 P(Disease) = 0.01
Base rate
🟢 P(+Test) = 0.059
Overall positive rate

What We Want

Goal:

Find: P(Disease|+Test) = ?

If test is positive, what's the probability of actually having the disease?

Apply Bayes' Rule:

P(D|+) =

0.95 × 0.01

0.059

=

0.0095

0.059

= 0.161

Result: Only 16.1% chance of disease despite positive test!

The Power of Bayes' Rule

Bayes' rule lets us invert conditional probabilities:

✅ What we can easily observe:

P(symptom|disease)
P(test result|condition)
P(effect|cause)

🎯 What we actually need:

P(disease|symptom)
P(condition|test result)
P(cause|effect)

This "probability inversion" is fundamental to diagnosis, prediction, machine learning, and AI reasoning!

Understanding the Components

Posterior = ^{Likelihood × Prior} ⁄ _Evidence

🔵

Prior: P(A)

What we believed before seeing evidence B

🟣

Likelihood: P(B|A)

How likely is evidence B if A is true?

🟢

Evidence: P(B)

Total probability of seeing evidence B

🔴

Posterior: P(A|B)

Updated belief after seeing evidence B

Classic Example: Medical Diagnosis

The Medical Testing Paradox

A patient tests positive for a disease. What's the probability they actually have it? The answer is often counterintuitive - Bayes' rule reveals the truth!

Interactive: Disease Testing Calculator

Adjust the sliders to see how base rates dramatically affect diagnosis!

Disease Prevalence P(Disease):

1.0%

Test Sensitivity P(+|Disease):

95%

False Positive Rate P(+|¬Disease):

5%

Bayes' Rule Calculation

Step 1: Calculate P(+)

Law of Total Probability

P(+) = P(+|D)×P(D) + P(+|¬D)×P(¬D)

Intuitive Meaning:
"What percentage of ALL people test positive?"
This combines two groups: sick people who test positive (true positives) + healthy people who test positive (false positives).

Step 2: Apply Bayes' Rule

Posterior Probability

P(D|+) = [P(+|D) × P(D)] / P(+)

Intuitive Meaning:
"Given that I tested positive, what's the chance I actually have the disease?"
This flips the perspective: from "test accuracy" to "disease probability after testing."

If test is positive:

—

Adjust sliders to calculate

The Counterintuitive Result

Even with 95% accurate test, if disease is rare (1%), positive test only means ~16% chance of disease!
This is because false positives from the 99% healthy population outnumber true positives from the 1% sick population. Base rates matter! This is why Bayes' rule is essential.

Classic Example: Burglary-Alarm Network

🛡️ The Burglary-Alarm Mystery

It's 2 AM. Your home alarm blares! 🚨
Your heart races - is it a burglar 🦹‍♂️ or just an earthquake 🌍 shaking the house?
Bayes' rule helps you figure out what really caused that alarm!

Why This Matters

Real alarms don't just say "burglar!" - they can be triggered by many causes.
Bayes' rule lets us update our beliefs about multiple possible causes when we get evidence.

🔍 Interactive: What's Really Causing My Alarm?

🦹

Burglar 1% chance

🌍

Earthquake 0.1% chance

⬇️

Triggers

🔔

ALARM SOUNDS!

Your evidence

Before Alarm: Base Rates

P(Burglar) - How common are burglaries? 0.1% 1.0% 5%

P(Earthquake) - How common are earthquakes? 0.01% 0.1% 1%

Alarm Sensitivity

P(Alarm|Burglar) - Burglars always trigger alarm 90% 99.9% 100%

P(Alarm|Earthquake) - Earthquakes usually trigger alarm 50% 70% 95%

Step 1: P(Alarm) - Total Probability

What % of time does alarm go off?

P(A) = P(A|B)×P(B) + P(A|E)×P(E)

Calculating...

Step 2: What Really Caused It?

Updated beliefs after hearing alarm

P(B|A) = [P(A|B)×P(B)] / P(A)
P(E|A) = [P(A|E)×P(E)] / P(A)

Calculating...

🦹‍♂️ Burglar Caused Alarm?

—

Adjust sliders to see

🌍 Earthquake Caused Alarm?

—

Adjust sliders to see

🔑 Key Insights

Multiple Causes:

Alarms don't just detect burglars - they respond to any trigger. Bayes helps us disentangle competing explanations.

Evidence Updates:

The alarm is evidence that shifts our beliefs, but doesn't give perfect certainty about what caused it.

Key Takeaways

Bayes' Rule Essentials

Formula: P(A|B) = P(B|A)×P(A) / P(B)
Purpose: Invert conditional probabilities
Components: Prior, Likelihood, Evidence, Posterior
Base Rates Matter: P(A) can dominate the result
Update Beliefs: Posterior becomes new prior

                            Why This Powers AI
                            Foundation of Bayesian networks
Enables learning from data
Handles uncertainty in reasoning
Used in spam filters, medical AI, robotics
The "learning rule" for probabilistic AI

                        

The Most Important Formula in AI

"Bayes' rule is the mathematical foundation for how AI systems update beliefs with evidence. Every time a spam filter learns, a medical AI diagnoses, or a robot localizes itself, Bayes' rule is working behind the scenes."

Next: Now you understand Bayes' rule! Let's explore probability distributions and joint/marginal probabilities. Continue to Topic 6 →