Bayes' Theorem — How to Actually Change Your Mind

In 2009, a radiologist in San Diego looked at a mammogram and said the word no patient wants to hear: positive. The woman was 40 years old, no family history, no symptoms. She went home and Googled "mammogram accuracy" and found a number — 90% — and spent two weeks believing she was almost certainly dying. She wasn't. The probability she actually had cancer was about 9%. Bayes' theorem could have told her that in thirty seconds. Nobody taught her Bayes' theorem.

· · ·

Chapter 1

The Reverend Who Solved Everything

Thomas Bayes was not a mathematician. Well, not primarily. He was a Presbyterian minister in Tunbridge Wells, England, who spent his days tending to souls and his evenings tending to a problem that had nagged natural philosophers since the dawn of probability theory: how do you reason backwards from evidence to causes?

Forward reasoning is easy. If I tell you a bag is 90% red balls, you can predict what you'll pull out. But life never works that way. You pull out a red ball and need to figure out which bag you're dealing with. You see a symptom and need to figure out the disease. You observe data and need to figure out the theory. The world gives you effects and demands you deduce causes.

Bayes figured out the math. He wrote it up in a manuscript called "An Essay towards solving a Problem in the Doctrine of Chances" and then, as mathematicians sometimes do, he died before publishing it. His friend Richard Price found the manuscript, polished it up, and sent it to the Royal Society in 1763.1

Bayes' original essay didn't contain the formula we now call "Bayes' theorem" in its modern form. That was worked out independently by Pierre-Simon Laplace in 1774, who — characteristically for French mathematicians — made it more elegant and more general. But history is kind to the person who had the idea first, even if he needed posthumous editing.

The core insight is simple enough to state in English: what you should believe after seeing the evidence depends on what you believed before. This sounds obvious. It is obvious. And yet it is the single most frequently violated principle in all of human reasoning.

· · ·

Chapter 2

The Formula on the Napkin

Here it is. The whole thing. The theorem that launched a thousand arguments in statistics departments and quietly runs half the algorithms on your phone:

Bayes' Theorem

P(H|E) = P(E|H) × P(H) / P(E)

The probability of the hypothesis given the evidence equals the probability of the evidence given the hypothesis, times the prior probability of the hypothesis, divided by the total probability of the evidence.

P(H|E): Posterior — your updated belief after seeing the evidence
P(H): Prior — your belief before seeing the evidence
P(E|H): Likelihood — probability of seeing this evidence if the hypothesis is true
P(E): Evidence — total probability of seeing this evidence under all scenarios

Let me translate this into something you can hold in your head. Imagine you're at a party and someone mentions they went to Harvard. You want to know: are they actually brilliant, or did they just have rich parents?

The prior is what you believed before they opened their mouth — maybe a 20% chance any given person at this party is genuinely brilliant. The likelihood is: how probable is it that a brilliant person would mention Harvard? (Pretty high — they're proud of it.) But here's the crucial part: the evidence term P(E) accounts for the fact that people of all intelligence levels mention Harvard at parties, possibly with even greater frequency if they have nothing else to brag about.

The denominator is where all the magic lives. It forces you to ask: what else could explain this evidence? Skip that question, and you get the wrong answer. Every time.

The prior is not bias. The prior is bookkeeping. It's the universe reminding you what's common before you go looking for what's special.

There's a beautiful alternative form that makes the machinery even clearer. Instead of thinking about raw probabilities, think about odds and likelihood ratios:

The Odds Form

Posterior Odds = Prior Odds × Likelihood Ratio

Your new odds equal your old odds multiplied by the strength of the evidence. Evidence that's 10× more likely under H than under ¬H multiplies your odds by 10.

This form is what practicing Bayesians actually use, because it makes the update step trivially simple. If you started at 1:99 odds (1% prior) and the evidence has a likelihood ratio of 20 — meaning it's 20 times more likely if the hypothesis is true — your new odds are 20:99, or about 17%. One multiplication. No need to compute P(E) at all.

The Bayesian tree: follow the branches to see why a positive test on a rare disease almost always means "false alarm."

· · ·

Chapter 3

The Doctor's Dilemma

This is the example that makes Bayes' theorem famous, and it is the example that makes doctors squirm.

A rare disease affects 1 in 1,000 people. There's a screening test: if you have the disease, it correctly returns positive 99% of the time (sensitivity = 99%). If you don't have the disease, it correctly returns negative 95% of the time (specificity = 95%, meaning a 5% false positive rate).

You take the test. It comes back positive.

Quick: what's the probability you actually have the disease?

In a classic 1978 study by Casscells, Schoenberger, and Graboys, 60 staff and students at Harvard Medical School were given essentially this problem. Almost half said 95%. The most common answer was "high." The actual answer is about 2%.2

Your "positive" result means you almost certainly don't have the disease. If that doesn't unsettle you, you're not paying attention.

Here's why. Picture a stadium with 100,000 people — a useful number because it makes the arithmetic visible:

100 have the disease. The test catches 99 of them. (One unlucky person is missed.)

99,900 are healthy. But the 5% false positive rate tags 4,995 of them anyway.

Total positive results: 99 + 4,995 = 5,094. Of those, only 99 are truly sick. So if your test is positive, the probability you actually have the disease is 99 / 5,094 ≈ 1.9%.

The intuition-killer is the base rate. The disease is so rare that even a good test generates a flood of false alarms from the vast healthy population. The 4,995 false positives swamp the 99 true positives. It's not that the test is bad — it's that the prior is doing almost all the work.

Base Rate Neglect

The accuracy of a test means nothing without the base rate. A 95%-specific test for a 1-in-1,000 disease is practically useless as a standalone screen. This error — ignoring priors and focusing only on the test's accuracy — is called base rate neglect, and it is arguably the most dangerous systematic error in human reasoning. It affects medical diagnosis, criminal justice, airport security, and any domain where we screen for rare events.

Base rate neglect: when the condition is rare, false positives outnumber true positives by 50 to 1.

Gerd Gigerenzer, the German psychologist who has spent decades studying this problem, discovered something hopeful. When you rephrase the problem using natural frequencies — "Out of every 10,000 people, 10 have the disease; of those 10, about 9 test positive; meanwhile, about 500 healthy people also test positive" — suddenly people get it right.3 Our brains aren't bad at Bayesian reasoning. They're bad at probabilities. Give them counts, and they do fine. Evolution trained us on frequencies, not fractions.

· · ·

Chapter 4

See It for Yourself

Enough reading. Play with the numbers. Each dot below represents a person. Watch what happens to the crowd of positive results as you change the base rate — that's where your intuition fails hardest.

Bayesian Diagnosis Machine

Adjust the disease prevalence, test sensitivity, and specificity. The grid shows a population of 1,000 — each dot is a person. The number that matters is at the bottom: given a positive test, what are the actual odds?

Disease Prevalence (Base Rate) 0.1%

Sensitivity (True Positive Rate) 99.0%

Specificity (True Negative Rate) 95.0%

Probability You're Actually Sick

2.0%

Almost certainly a false positive.

True Positives

False Positives

Total Positives

Likelihood Ratio

19.8

True Positive (sick + caught)

False Positive (healthy + flagged)

True Negative (healthy + clear)

False Negative (sick + missed)

Now try this: crank the base rate up to 10% and watch the posterior leap. The same test, the same accuracy — but because the disease is more common, a positive result means something completely different. The test didn't change. The context did. That's Bayes' theorem in one sentence.

· · ·

Chapter 5

The Second Test (Where the Magic Happens)

Your first test came back positive and the probability you're sick is only 2%. So your doctor says: "Let's run it again." You take the same test, and it's positive again.

Now what? This is where Bayes' theorem reveals its most beautiful property: today's posterior becomes tomorrow's prior. The 2% from your first test becomes the starting belief for interpreting the second test. And 2% is a much higher prior than 0.1%.

With a 2% prior, 99% sensitivity, and 95% specificity, the second positive test pushes you to about 29%. A third positive? About 89%. Each test ratchets the probability upward — slowly at first, when the prior is resisting, then faster as the evidence accumulates.

Sequential Testing: Watch Belief Accumulate

Start with a base rate and test accuracy. Then click "Positive Result" or "Negative Result" to see how each new piece of evidence shifts the probability. This is how rational belief change actually works — not as a dramatic conversion, but as a relentless accumulation of signal.

Starting Base Rate 0.1%

Test Likelihood Ratio (positive result) 19.8×

Current Belief

0.1%

No tests yet. This is your prior.

Why This Matters

This is how science works — or should work. No single experiment "proves" anything. Each result is a likelihood ratio that nudges belief. A well-designed experiment with a high likelihood ratio moves belief a lot. A sloppy one barely moves it at all. The accumulation of many independent results, each multiplying the odds, is what eventually produces near-certainty. There is no shortcut.

The Bayesian ratchet: three identical positive tests take you from 0.1% to 89%. No single test is decisive — the accumulation is.

· · ·

Chapter 6

The War That Wouldn't End

For most of the 20th century, statistics had a civil war. On one side: the frequentists, led by the intellectual descendants of Ronald Fisher and Jerzy Neyman. On the other: the Bayesians, a ragtag band who thought the frequentists had made a catastrophic wrong turn somewhere around 1925.

The disagreement sounds technical, but it's philosophical, and it cuts deep. Frequentists say probability is about long-run frequencies. A fair coin means: flip it infinitely many times, half come up heads. Probability is a property of the physical world.

Bayesians say that's lovely but useless. When a climate scientist says there's a 90% chance humans are causing global warming, she doesn't mean that in 90% of parallel Earths we'd see warming. That's gibberish. She means: given her evidence and expertise, she is 90% confident. Probability is a degree of belief.

The frequentist asks: "How often would this procedure get the right answer?" The Bayesian asks: "Given what I've seen, what should I believe right now?"

Fisher despised Bayesian methods. He called inverse probability "a mare's nest of unsupported claims."4 The Bayesians fired back that Fisher's p-values were a statistical abomination — a number that doesn't answer the question anyone actually wants to ask.

What scientists want to know: "Given this data, is my hypothesis true?" What a p-value tells them: "If my hypothesis were false, how surprising would this data be?" These are not the same question. They are inverses of each other, connected by — you guessed it — Bayes' theorem.

A p-value of 0.05 does not mean there's a 5% chance the null hypothesis is true. That's the posterior probability, which requires a prior. Confusing these two is called the prosecutor's fallacy, and it has sent innocent people to prison.5

The war has softened in recent decades. Most working scientists use whatever works — frequentist methods for simple problems, Bayesian methods for complex ones. But the philosophical tension never went away, because it can't. It's a question about what probability means, and that's not the kind of question data can settle.

· · ·

Chapter 7

Spam, Suspects, and the Replication Crisis

If Bayes' theorem were only useful for terrifying people about medical tests, it would be a curiosity. But it quietly runs the modern world.

Your Inbox

Every time Gmail decides whether an email is spam, it's running Bayes' theorem. The prior is the base rate of spam (~45% of all email). The evidence is the words in the message. The likelihood ratio: how much more likely is the phrase "Nigerian prince" in spam versus legitimate mail? Paul Graham's 2002 essay "A Plan for Spam" kickstarted modern Bayesian spam filtering, and the core idea remains unchanged: each word in the email is evidence, each evidence has a likelihood ratio, and the ratios multiply together to produce a verdict.6

The Courtroom

DNA evidence is a Bayesian problem in disguise. A forensic analyst tells the jury: "The probability of a random person matching this DNA profile is 1 in 10 million." The jury hears: "There's a 1 in 10 million chance the defendant is innocent." Those are completely different statements.

To get from the first to the second, you need Bayes' theorem, and you need a prior: how many plausible suspects exist? If the city has 5 million people and no other evidence narrows the field, the expected number of random matches is 0.5. The DNA evidence is strong but not overwhelming. The prosecutor's fallacy — collapsing the likelihood into a posterior by ignoring the prior — is not an abstraction. It is a machine for producing wrongful convictions.7

The Replication Crisis

The crisis in psychology, medicine, and other fields is, at its heart, a Bayesian crisis. It's the medical test problem again: most hypotheses tested are wrong (that's why we test them), sample sizes are small (the test has low specificity), and the significance threshold is lenient (p < 0.05 means a 5% false positive rate). Run this through Bayes' theorem and you get exactly what Ioannidis found in his landmark 2005 paper: most published research findings are false.8

Prior probability hypothesis is true	Power (sensitivity)	α (false positive rate)	Probability a "significant" finding is real
50% (strong theory)	80%	5%	94%
10% (exploratory)	80%	5%	64%
1% (fishing expedition)	80%	5%	14%
0.1% (long shot)	80%	5%	1.6%

That bottom row is the medical test problem wearing a lab coat. The "disease" is "true finding." The "test" is "p < 0.05." And the base rate is not in your favor.

· · ·

Chapter 8

How to Actually Change Your Mind

Bayes' theorem isn't just a formula. It's a discipline of thought — maybe the best one we have for navigating a world that throws evidence of varying quality at you all day. Here are the lessons, stripped to bone:

1. Your prior matters. Admit you have one. Everyone walks into every question with existing beliefs. The Bayesian says: write them down. Make them visible. Then update rationally. The alternative — pretending to start from a blank slate — means your priors are hidden, unexamined, and probably wrong.

2. Extraordinary claims require extraordinary evidence. Carl Sagan was being Bayesian. If your prior for "aliens visited Earth" is 0.0001%, you need a likelihood ratio in the millions to budge it. A blurry photo — which is roughly equally likely whether aliens exist or teenagers exist — has a likelihood ratio near 1. It tells you nothing. This isn't closed-mindedness; it's arithmetic.

3. Update incrementally, not dramatically. You don't go from "I believe X" to "I believe not-X" in one step. Each piece of evidence nudges you. This is how rational belief change actually works — not as a dramatic conversion, but as a slow, relentless accumulation of signal. (This is what the sequential calculator above was showing you.)

4. Always ask: "how common is the thing I'm testing for?" Before asking "how good is this test?" ask "how prevalent is the condition?" Before evaluating evidence, establish the base rate. The denominator is not optional.

5. Beware the likelihood ratio of 1. Evidence that's equally likely whether the hypothesis is true or false tells you nothing. Your posterior equals your prior. This eliminates a shocking amount of "evidence" people treat as meaningful. ("He looks like a criminal" — well, what does a criminal look like? If the feature is equally common in criminals and non-criminals, it's noise, not signal.)

6. Seek disconfirming evidence. Confirmation bias is the anti-Bayesian sin. If you only look for evidence that confirms your hypothesis, you're only computing the numerator. You need the denominator — the probability of the evidence under alternative hypotheses — to update correctly. The most informative evidence is the kind that differentiates between hypotheses, not the kind that's consistent with your favorite one.

To be Bayesian is not to be certain. It is to be precisely uncertain — to know exactly how much you don't know, and to update by exactly the right amount when you learn something new.

Reverend Bayes didn't set out to start a revolution. He was a minister with a math hobby, curious about a puzzle involving chances. But the thing he found — that there is a precise, mechanical, inarguable way to update beliefs in the light of evidence — turns out to be one of the most important ideas humans have ever had. Not because it's complicated. Because it's right. And because we so reliably get it wrong without it.

Notes & References

Bayes, T. (1763). "An Essay towards Solving a Problem in the Doctrine of Chances." Philosophical Transactions of the Royal Society of London, 53, 370–418. Communicated by Richard Price, who added a preface and appendix that arguably contributed as much as the original.
Casscells, W., Schoenberger, A., & Graboys, T.B. (1978). "Interpretation by Physicians of Clinical Laboratory Results." New England Journal of Medicine, 299(18), 999–1001. The study used slightly different numbers but the same structure. It has been replicated many times with similar results.
Gigerenzer, G. & Hoffrage, U. (1995). "How to Improve Bayesian Reasoning Without Instruction: Frequency Formats." Psychological Review, 102(4), 684–704. Gigerenzer's program of research suggests that "cognitive illusions" may be artifacts of how problems are presented, not deep flaws in human reasoning.
Fisher, R.A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh. Fisher's hostility to Bayesian methods was legendary and, some argue, held back statistics by decades.
Thompson, W.C. & Schumann, E.L. (1987). "Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor's Fallacy and the Defense Attorney's Fallacy." Law and Human Behavior, 11(3), 167–187.
Graham, P. (2002). "A Plan for Spam." paulgraham.com/spam.html. Graham showed that a naïve Bayesian classifier using individual word probabilities could catch over 99.5% of spam with near-zero false positives — better than any rule-based system.
Balding, D.J. & Donnelly, P. (1994). "The Prosecutor's Fallacy and DNA Evidence." Criminal Law Review, 711–721. See also the case of Sally Clark, wrongly convicted of murdering her two children based on a catastrophic misuse of probability by an expert witness.
Ioannidis, J.P.A. (2005). "Why Most Published Research Findings Are False." PLoS Medicine, 2(8), e124. The most-accessed article in PLoS history. Its core argument is pure Bayesian reasoning applied to the scientific enterprise.