Bayesian Inference in Practice | Lecture 10

What is Bayesian Inference?

The Core Idea

Bayesian inference is the process of updating our beliefs about the world as we gather new evidence. It's how rational agents should think when faced with uncertainty!

The Formula

P(H | E) =

P(E | H) · P(H)

P(E)

H = Hypothesis, E = Evidence

Components

Prior P(H): Initial belief before evidence
Likelihood P(E|H): How likely is evidence if hypothesis is true?
Evidence P(E): Total probability of observing evidence
Posterior P(H|E): Updated belief after seeing evidence

Why It Matters

Bayesian inference is the optimal way to learn from data. It's used in: medical diagnosis, spam filters, machine learning, robotics, autonomous vehicles, weather forecasting, and more!

Example 1: Medical Diagnosis

The Scenario

During the COVID-19 pandemic, you take a rapid antigen test. It's positive. What's the probability you actually have COVID-19? The answer might surprise you!

Given Information

🦠

COVID Prevalence

P(COVID) = 0.01

1% of population has COVID

✅

Test Sensitivity

P(+ | COVID) = 0.95

95% true positive rate

❌

False Positive Rate

P(+ | ¬COVID) = 0.05

5% false positive rate

Step-by-Step Solution

1

Identify What We're Looking For

We want to find: P(COVID | +) = Probability of COVID given positive test

Question: If test is positive, what's the chance I have COVID?

2

Calculate P(+) - Total Probability of Positive Test

Use the law of total probability:

P(+) = P(+ | COVID) · P(COVID) + P(+ | ¬COVID) · P(¬COVID)

P(+) = 0.95 × 0.01 + 0.05 × 0.99

P(+) = 0.0095 + 0.0495

P(+) = 0.059

5.9% of all people (sick and healthy) will test positive

3

Apply Bayes' Theorem

P(COVID | +) =

P(+ | COVID) · P(COVID)

P(+)

P(COVID | +) =

0.95 × 0.01

0.059

P(COVID | +) =

0.0095

0.059

P(COVID | +) = 0.161 = 16.1%

Result

Even with a positive test, there's only a 16.1% chance of having COVID!

Why So Low? The Base Rate Fallacy

Key insight: Most positive tests come from false positives, not true disease cases!

Out of 100 people:
• 1 has COVID
• 99 are healthy

True positives:
• 1 × 95% = 0.95
(actual COVID cases)

False positives:
• 99 × 5% = 4.95
(healthy people)

Bottom line: Of every 5.9 positive tests, only 0.95 (16.1%) come from actual COVID cases. The remaining 4.95 (83.9%) are false positives from the 99% healthy population! This is why rare diseases make positive tests unreliable.

Interactive Medical Diagnosis Calculator

Adjust the parameters and see how the posterior probability changes!

COVID Prevalence P(COVID) 1.0%

Sensitivity P(+|COVID) 95%

False Positive P(+|¬COVID) 5%

Calculation Breakdown

Step 1: Calculate P(+)

                                        P(+) = 0.95×0.01 + 0.05×0.99 = 0.059
                                    

Step 2: Apply Bayes' Theorem

                                        P(D|+) = (0.95×0.01) / 0.059 = 0.161
                                    

Posterior Probability

P(COVID | Positive Test)

16.1%

Try this: Increase the COVID prevalence to 50% and observe how the posterior probability changes dramatically! With common diseases, positive tests are much more reliable.

Note: Sensitivity P(+|COVID) and False Positive P(+|¬COVID) are independent parameters. They don't need to sum to 1 - each measures different aspects of test performance.

Example 2: Spam Email Filter

📧 1. The Suspicious Email

📧 Your Inbox: Suspicious Email Alert

You open your email and see this subject line:
"🎉 CONGRATULATIONS! You Won FREE iPhone - CLICK HERE to Claim!"

Should you trust this email? Bayesian spam filters (like Gmail's) analyze word patterns to protect you from scams!

🔍 2. Word Analysis: The "Spam Fingerprint"

🔍 Breaking Down the Email: "FREE", "WIN", "CLICK"

How Spam Filters Think

Spam filters learn from thousands of emails. They ask: "How often does each word appear in spam vs. legitimate emails?" This creates a "word fingerprint" that identifies suspicious patterns.

"FREE"

🚫

SPAM

80%

✅

HAM

10%

What these numbers mean:

Out of every 10 spam emails: 8 contain "FREE", 2 don't
Out of every 10 legitimate emails: 1 contains "FREE", 9 don't
This makes "FREE" a strong spam indicator!

"WIN"

🚫

SPAM

70%

✅

HAM

5%

What these numbers mean:

Out of every 10 spam emails: 7 contain "WIN", 3 don't
Out of every 10 legitimate emails: 1 contains "WIN", 9 don't
Very suspicious - "WIN" appears rarely in real emails!

"CLICK"

🚫

SPAM

65%

✅

HAM

15%

What these numbers mean:

Out of every 10 spam emails: 7 contain "CLICK", 3 don't
Out of every 10 legitimate emails: 2 contain "CLICK", 8 don't
Common in both, but still favors spam classification

What We Know About Email Traffic

🚫

40%

of emails are SPAM

P(Spam) = 0.40

✅

60%

are LEGITIMATE

P(Ham) = 0.60

Based on historical email data, we know that 4 out of 10 emails are typically spam.

🤔 3. The "Naive" Assumption

🤔 The "Naive" Trick: Treating Words as Independent

In reality, words in emails are often dependent (e.g., "FREE" and "WIN" often appear together). But Naive Bayes makes a simplifying assumption: words appear independently.

Instead of calculating:

P(FREE, WIN, CLICK | Spam) = [complex joint probability]

We simplify to:

P(FREE, WIN, CLICK | Spam) = P(FREE|Spam) × P(WIN|Spam) × P(CLICK|Spam)

= 0.80 × 0.70 × 0.65 = 0.364

Why it works: Even though the assumption is "naive," it performs remarkably well for text classification and is very fast to compute!

⚖️ 4. Bayesian Analysis

⚖️ Bayesian Analysis: Spam vs Legitimate

The Big Question

Given that we see "FREE", "WIN", and "CLICK" together, what's the probability this email is spam vs. legitimate? We use Bayes' theorem to weigh the evidence!

Step 1: If It's SPAM...

How likely are these words in spam?

P(FREE, WIN, CLICK | Spam) = 0.80 × 0.70 × 0.65

= 0.364

36.4% chance of seeing this word combo in spam

Multiply by prior belief:

P(Words|Spam) × P(Spam) = 0.364 × 0.40

= 0.1456

Weighted spam likelihood

Step 2: If It's LEGITIMATE...

How likely are these words in legitimate emails?

P(FREE, WIN, CLICK | Ham) = 0.10 × 0.05 × 0.15

= 0.00075

0.075% chance of seeing this word combo in legitimate emails

Multiply by prior belief:

P(Words|Ham) × P(Ham) = 0.00075 × 0.60

= 0.00045

Weighted legitimate likelihood

3

Normalize to Get Posterior Probabilities

Total probability of observing these words:

P(Words) = 0.1456 + 0.00045 = 0.14605

Probability it's SPAM:

P(Spam | Words) = 0.1456 / 0.14605 = 99.7%

Probability it's HAM:

P(Ham | Words) = 0.00045 / 0.14605 = 0.3%

Verdict: SPAM!

99.7% confidence this email is spam. The combination of "FREE", "WIN", and "CLICK" is extremely indicative of spam emails. These words are much more common in spam than in legitimate emails.

📊 5. Results & Key Insights

Probability Comparison

🚫 SPAM

99.7%

High confidence spam

✅ HAM (Legitimate)

0.3%

Very unlikely legitimate

Key Insights

Bayesian spam filters learn from examples. They track word frequencies in spam vs. ham emails, then use Bayes' theorem to classify new emails. Modern filters use thousands of features (words, phrases, metadata) and achieve >99% accuracy!

Example 3: Robot Sensor Fusion

🚗 1. Self-Driving Car Localization

🚗 The Self-Driving Car Challenge

Your autonomous car is driving through a parking garage.
GPS says you're near the entrance, but cameras see blue walls (typical of the back section).

Which sensor should you trust? Neither is perfect! Bayesian sensor fusion combines both for accurate localization!

📡 2. Understanding GPS vs Camera Sensors

🏭 Parking Garage Layout: 3 Possible Locations

How Sensors Work

GPS gives a rough estimate of your general area, while cameras provide detailed visual information about your immediate surroundings. Neither is perfect, but together they give accurate localization!

🏭 Parking Garage: 3 Distinct Areas

🚪

Location A

Entrance Area
(Gray walls)

🔵

Location B

Back Section
(Blue walls)

🟡

Location C

Middle Area
(Yellow walls)

GPS Sensor: Rough Location Estimate

📡

GPS gives a general area estimate

GPS says you're in this general vicinity:

50%

🚪

Location A

30%

🔵

Location B

20%

🟡

Location C

GPS Insight: Satellite positioning gives rough area estimates but can be inaccurate in enclosed spaces like parking garages.

Camera Sensor: Detailed Visual Evidence

📷

Camera sees specific wall colors

Camera detects: "Blue wall visible"

How likely is this at each location?

10%

🚪

Location A
(Gray walls)

80%

🔵

Location B
(Blue walls)

40%

🟡

Location C
(Yellow walls)

Camera Insight: Visual sensors provide detailed local information but can be affected by lighting and obstructions.

⚖️ 3. Bayesian Sensor Fusion

🔄 Combining GPS + Camera: The Power of Multiple Sensors

The Fusion Question

GPS says you're probably at the entrance, but camera sees blue walls. Bayes' theorem combines these conflicting signals to give the most accurate location estimate!

Step 1: Calculate Raw Scores

Likelihood × Prior for each location:

Location A (Entrance):

P(Blue|A) × P(A) = 0.10 × 0.50

= 0.050

GPS said 50% likely, camera says 10% chance of blue wall

Location B (Back Section):

P(Blue|B) × P(B) = 0.80 × 0.30

= 0.240

GPS said 30% likely, camera says 80% chance of blue wall

Location C (Middle Area):

P(Blue|C) × P(C) = 0.40 × 0.20

= 0.080

GPS said 20% likely, camera says 40% chance of blue wall

Step 2: Normalize to Get Final Probabilities

Total evidence strength:

P(Blue Wall) = 0.050 + 0.240 + 0.080

= 0.370

Sum of all raw scores

Final Location Probabilities:

A: 0.050 ÷ 0.370

= 13.5%

B: 0.240 ÷ 0.370

= 64.9%

C: 0.080 ÷ 0.370

= 21.6%

Normalization: Divide each raw score by total to get valid probabilities that sum to 100%.

Before Fusion: GPS Alone

50%

🚪

Location A

30%

🔵

Location B

20%

🟡

Location C

GPS thought Location A was most likely

After Fusion: GPS + Camera

13.5%

🚪

Location A

64.9%

🎯

MOST LIKELY!

21.6%

🟡

Location C

Camera evidence completely changed our belief!

🎯 Sensor Fusion Success!

The camera's visual evidence overrode GPS! Even though GPS was more confident about Location A (50%), the camera's strong signal for blue walls (80% at Location B) completely shifted our belief to Location B (64.9%).

Key Insight: Multiple sensors working together can be more accurate than any single sensor alone. This is why self-driving cars use many sensors!

📊 4. Interactive Visualization & Experimentation

Belief Update Visualization

Notice how the posterior (green) differs from the prior (blue) after incorporating camera evidence

🎮 Interactive Sensor Fusion Lab

Experiment with different sensor characteristics and see how they affect localization accuracy!

P(Blue | Location A) 10%

P(Blue | Location B) 80%

P(Blue | Location C) 40%

Location A

13.5%

Posterior Probability

Location B

64.9%

Posterior Probability

Location C

21.6%

Posterior Probability

🚀 5. Real-World Impact & Applications

Where Bayesian Sensor Fusion Makes a Difference

Autonomous Vehicles

Camera + Radar + LiDAR + GPS for precise positioning
Handles GPS outages in tunnels/cities
Combines visual lane detection with radar distance measurement
Essential for safe self-driving

Smartphones & GPS

GPS + WiFi + Cell towers for accurate location
Works indoors where GPS alone fails
Powers navigation apps and location services
Enables location-based advertising

Drones & Robotics

IMU + GPS + Visual odometry for stable flight
Maintains position during GPS loss
Combines accelerometer data with camera input
Critical for autonomous navigation

Industrial Applications

Multiple sensors for quality control and inspection
Robotic arms combining vision and touch sensors
Medical imaging combining different modalities
Weather forecasting with multiple data sources

The Power of Sensor Fusion

Individual sensors are imperfect, but when combined intelligently using Bayesian methods, they create systems that are more reliable than any single sensor alone. This is why modern autonomous systems use 10+ different sensors working together!

Key Takeaways

What We Learned

Bayesian inference updates beliefs with new evidence
Prior beliefs matter but can be overcome by strong evidence
Rare events (disease, spam) require careful interpretation
Multiple sources of evidence can be combined optimally
Normalization ensures probabilities sum to 1

                            Practical Tips
                            Always identify prior, likelihood, and evidence
Use law of total probability for P(E)
Normalize to ensure valid probability distribution
Consider all hypotheses that could explain evidence
Update iteratively as new evidence arrives

                        

The Power of Bayesian Thinking

Bayesian inference is the mathematically optimal way to learn from data.
It's the foundation of modern AI, from medical diagnosis to spam filters to self-driving cars.
Every time you see new evidence, ask: "How should this update my beliefs?"

Next Steps: Now that you understand Bayesian inference, you're ready to explore more advanced topics like Bayesian networks, machine learning, and probabilistic programming! Back to Lecture 10 →