All Chapters

The Missing Chapter

Base Rate Neglect

Why a 99% accurate test can be wrong 99% of the time

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 30

The Blue Cab Problem

A cab was involved in a hit-and-run accident at night. Two cab companies operate in the city: the Green company, which owns 85% of the cabs, and the Blue company, which owns the other 15%. A witness identified the cab as Blue. The court tested the witness's ability to distinguish Blue from Green cabs under nighttime conditions and found she was correct 80% of the time.

Quick — what's the probability the cab was actually Blue?

If you said 80%, you're in excellent company. You're also wrong. Most people — and by "most people" I include judges, lawyers, and a disturbing number of statisticians answering in a hurry — give 80%.1 The witness is 80% reliable, so there's an 80% chance she's right. What could be simpler?

The answer is about 41%. And understanding why it's 41% is one of the most important things you can do with your brain.

The mistake has a name: base rate neglect. Your brain heard "80% reliable" and ran with it, completely ignoring the other crucial piece of information — that 85% of the cabs in the city are Green. The base rate, the background prevalence, the thing that was true before any evidence showed up, got left on the cutting room floor of your mental processing.

100 cabs in the city 85 Green Cabs 15 Blue Cabs Witness says Green 68 Witness says Blue ✗ 17 Says Blue ✓ 12 Says Green 3 Witness says Blue: 12 right + 17 wrong = 29 total → 12/29 ≈ 41%
Natural frequencies make the math visible: of 29 "Blue" identifications, only 12 are correct.

Thinking in People, Not Percentages

Let me show you why the answer is 41%, and I'm going to do it the way the cognitive scientist Gerd Gigerenzer argues we should always do it — with natural frequencies instead of probabilities.2 Our brains evolved to count things, not to manipulate decimals. So let's count.

Imagine 100 cabs. 85 are Green, 15 are Blue. Now the witness looks at each one:

Of the 85 Green cabs, she correctly calls 80% of them Green. That's 68. But she incorrectly calls 20% of them Blue. That's 17 Green cabs she says are Blue.

Of the 15 Blue cabs, she correctly calls 80% of them Blue. That's 12 Blue cabs she says are Blue. She misidentifies the other 3 as Green.

So when she says "Blue," there are 12 + 17 = 29 cabs she'd call Blue. Only 12 of those are actually Blue.

12 out of 29 ≈ 41%.

Notice what happened. There are so many Green cabs that even though she's only wrong 20% of the time about them, those mistakes pile up. The 17 false "Blues" swamp the 12 genuine Blues. The base rate — 85% Green — does the heavy lifting in the final answer, but our brains refuse to give it the weight it deserves.

· · ·
Chapter 30

The Mammogram Problem

If the cab problem were only about cabs, we could file it under "interesting cocktail party trivia" and move on. But the same error shows up everywhere, and in some places it's genuinely dangerous.

Here's the version that keeps epidemiologists up at night:

The prevalence of breast cancer in women undergoing routine screening is about 1%. If a woman has breast cancer, a mammogram will correctly detect it 90% of the time (sensitivity). If she does not have cancer, the mammogram will still come back positive about 9% of the time (false positive rate).

A woman gets a positive mammogram. What's the probability she actually has cancer?

When David Eddy posed this question to physicians in 1982, 95 out of 100 estimated the probability at around 75%.3 The actual answer is approximately 9%.

Let's count again, Gigerenzer-style. Take 1,000 women:

Natural Frequencies (1,000 women)
10 have cancer → 9 test positive (true positives)
990 cancer-free → 89 test positive (false positives)
Total positive: 9 + 89 = 98
P(cancer | positive) = 9 / 98 ≈ 9.2%

Nine percent. Not seventy-five. The doctors were off by a factor of eight. And these were physicians making clinical decisions based on exactly this kind of reasoning.

The culprit is the same: base rate neglect. Cancer is rare (1%), so the pool of cancer-free women is enormous. Even a small false positive rate applied to an enormous pool generates a flood of false alarms that drowns out the real signal.

"The base rate is the ocean your evidence swims in. Ignore the ocean, and you'll drown in false positives."

Why This Matters: From Airports to Courtrooms

Scale the rarity up and the problem gets absurd. Suppose the TSA installs a terrorist-detecting scanner that's 99.9% accurate — it catches 99.9% of terrorists and falsely flags only 0.1% of innocent people. Sounds incredible, right?

Now run a million passengers through it. If one of them is a terrorist:4

Airport Security Theater
1 terrorist → flagged (true positive)
999,999 innocent × 0.1% → 1,000 flagged (false positives)
P(terrorist | flagged) = 1/1,001 ≈ 0.1%

Your 99.9% accurate detector produces 1,001 alerts, and 1,000 of them are innocent people missing their flights. The positive predictive value — the chance that a flagged person is actually dangerous — is about one-tenth of one percent.

The Rarer the Condition, the Worse the Test Performs 0% 50% 100% PPV 0.1% 1% 10% 50% Prevalence 9% PPV at 1% prev 99% accuracy 90% sens / 9% FPR
Positive predictive value (PPV) climbs steeply with prevalence. At low prevalence, even excellent tests mostly produce false positives.
· · ·
Chapter 30

The Prosecutor's Fallacy

In courtrooms, base rate neglect wears a suit and tie and calls itself the prosecutor's fallacy. It goes like this: "The probability of this DNA match occurring by chance is one in a million. Therefore, there's only a one-in-a-million chance the defendant is innocent."

Do you see the sleight of hand? The prosecutor has confused P(evidence | innocence) with P(innocence | evidence). These are not the same thing — they're related by Bayes' theorem, and the base rate is what connects them.5

If you tested 10 million people from a city to find your suspect, you'd expect about 10 people to match that one-in-a-million DNA profile. Your suspect is one of ten, not one of one. The probability of innocence is more like 90% — not one in a million. The base rate (how many people were in the pool of possible suspects) matters enormously, and ignoring it has sent innocent people to prison.

COVID and the Testing Paradox

The pandemic gave everyone a crash course in base rates, whether they wanted one or not. Early in 2020, when COVID prevalence in most areas was below 1%, mass testing produced a flood of false positives.6 A rapid antigen test with 98% specificity sounds great — until you test 100,000 people in a town where only 200 actually have COVID. You get roughly 180 true positives and 1,996 false positives. More than 90% of positive results are wrong.

As the pandemic surged and prevalence climbed to 10%, 20%, 30%, the same test became dramatically more reliable. At 20% prevalence, the positive predictive value jumps to about 92%. The test didn't change. The base rate did.

Similarly, workplace drug testing programs face the same arithmetic. If only 2% of employees use drugs and your test is 95% accurate, most positive results will be false positives — a fact that has ended careers unjustly.7

The Core Lesson

The accuracy of a test is not the same as the accuracy of its results. A test's positive predictive value depends on three things: sensitivity (true positive rate), specificity (1 minus false positive rate), and — crucially — the base rate of the condition in the population. Leave out the base rate and you'll be wrong most of the time about rare things.

· · ·
Chapter 30

Build Your Intuition

The best cure for base rate neglect is practice. The calculator below lets you explore how prevalence, sensitivity, and false positive rate interact to determine the positive predictive value. Play with the sliders. Watch the grid. Develop a gut feeling for when rare events make tests unreliable.

Base Rate Calculator

Adjust the sliders to see how prevalence, sensitivity, and false positive rate affect the positive predictive value.

Positive Predictive Value
9.2%
If you test positive, there's a 9.2% chance you actually have the condition
True Positives
9
False Positives
89
True Negatives
901
False Negatives
1
True Positive
False Positive
False Negative
True Negative
· · ·
Chapter 30

Test Your Bayesian Instincts

Now that you've seen the math, let's see if your gut has caught up. The game below presents scenarios with different base rates and test accuracies. Guess the probability that a positive result is correct, then see how close you were to the Bayesian answer.

The Cab Problem Game

You'll see a scenario. Estimate the probability that a positive result is truly positive. Then see the real answer.

Round 1 of 6 · Score: 0
· · ·
Chapter 30

Why Your Brain Does This

There's a reason base rate neglect is so persistent: it's baked into our cognitive architecture. Kahneman and Tversky called it the representativeness heuristic — we judge probabilities by how much something "looks like" the category in question.8 The witness said Blue, and Blue looks like Blue, so we go with Blue. The mammogram said cancer, and cancer matches cancer, so we think cancer.

The base rate, on the other hand, is abstract, statistical, and boring. It's the background hum of the world — 85% of cabs are Green, 99% of people are healthy, 999,999 out of a million aren't terrorists. It doesn't tell a story. It doesn't grab attention. And so our brains just… skip it.

Gigerenzer's great insight was that this isn't an unfixable bug in human cognition. It's a formatting problem. When you translate percentages into natural frequencies — "12 out of 29" instead of "multiply the prior by the likelihood ratio and normalize" — people's accuracy jumps dramatically. In his studies, presenting problems in frequency format boosted correct answers from about 10% to about 50%.2

The lesson isn't that humans are hopelessly irrational. It's that we're rational in the right format. Evolution built us to count things we've actually encountered, not to multiply abstract decimals. So whenever you face a question about rare events and imperfect tests, do yourself a favor: stop thinking in percentages. Start counting.

Pick a population. A thousand people, a hundred cabs, a million airline passengers. Run the numbers forward. Count the true positives, count the false positives, and only then ask: of everyone who tested positive, how many actually are?

You'll be surprised how often the answer is "not many." And you'll never look at a test result the same way again.

Notes & References

  1. Tversky, A. & Kahneman, D. (1982). "Evidential impact of base rates." In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases, pp. 153–160. Cambridge University Press.
  2. Gigerenzer, G. & Hoffrage, U. (1995). "How to improve Bayesian reasoning without instruction: Frequency formats." Psychological Review, 102(4), 684–704.
  3. Eddy, D. M. (1982). "Probabilistic reasoning in clinical medicine: Problems and opportunities." In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty, pp. 249–267. Cambridge University Press.
  4. This example, with its deliberately absurd numbers, is adapted from various Bayesian reasoning textbooks. The base rate of actual terrorists per million passengers is, mercifully, far lower than one in a million in most airports.
  5. Thompson, W. C. & Schumann, E. L. (1987). "Interpretation of statistical evidence in criminal trials: The prosecutor's fallacy and the defense attorney's fallacy." Law and Human Behavior, 11(3), 167–187.
  6. Woloshin, S., Patel, N., & Kesselheim, A. S. (2020). "False negative tests for SARS-CoV-2 infection — challenges and implications." New England Journal of Medicine, 383(6), e38.
  7. Mandatory Guidelines for Federal Workplace Drug Testing Programs, SAMHSA, revised 2017. The guidelines require confirmatory testing precisely because of the false-positive problem at low prevalence.
  8. Kahneman, D. (2011). Thinking, Fast and Slow, Chapter 14–15. Farrar, Straus and Giroux.