The Blue Cab Problem
A cab was involved in a hit-and-run accident at night. Two cab companies operate in the city: the Green company, which owns 85% of the cabs, and the Blue company, which owns the other 15%. A witness identified the cab as Blue. The court tested the witness's ability to distinguish Blue from Green cabs under nighttime conditions and found she was correct 80% of the time.
Quick — what's the probability the cab was actually Blue?
If you said 80%, you're in excellent company. You're also wrong. Most people — and by "most people" I include judges, lawyers, and a disturbing number of statisticians answering in a hurry — give 80%.1 The witness is 80% reliable, so there's an 80% chance she's right. What could be simpler?
The answer is about 41%. And understanding why it's 41% is one of the most important things you can do with your brain.
The mistake has a name: base rate neglect. Your brain heard "80% reliable" and ran with it, completely ignoring the other crucial piece of information — that 85% of the cabs in the city are Green. The base rate, the background prevalence, the thing that was true before any evidence showed up, got left on the cutting room floor of your mental processing.
Thinking in People, Not Percentages
Let me show you why the answer is 41%, and I'm going to do it the way the cognitive scientist Gerd Gigerenzer argues we should always do it — with natural frequencies instead of probabilities.2 Our brains evolved to count things, not to manipulate decimals. So let's count.
Imagine 100 cabs. 85 are Green, 15 are Blue. Now the witness looks at each one:
Of the 85 Green cabs, she correctly calls 80% of them Green. That's 68. But she incorrectly calls 20% of them Blue. That's 17 Green cabs she says are Blue.
Of the 15 Blue cabs, she correctly calls 80% of them Blue. That's 12 Blue cabs she says are Blue. She misidentifies the other 3 as Green.
So when she says "Blue," there are 12 + 17 = 29 cabs she'd call Blue. Only 12 of those are actually Blue.
12 out of 29 ≈ 41%.
Notice what happened. There are so many Green cabs that even though she's only wrong 20% of the time about them, those mistakes pile up. The 17 false "Blues" swamp the 12 genuine Blues. The base rate — 85% Green — does the heavy lifting in the final answer, but our brains refuse to give it the weight it deserves.
The Mammogram Problem
If the cab problem were only about cabs, we could file it under "interesting cocktail party trivia" and move on. But the same error shows up everywhere, and in some places it's genuinely dangerous.
Here's the version that keeps epidemiologists up at night:
The prevalence of breast cancer in women undergoing routine screening is about 1%. If a woman has breast cancer, a mammogram will correctly detect it 90% of the time (sensitivity). If she does not have cancer, the mammogram will still come back positive about 9% of the time (false positive rate).
A woman gets a positive mammogram. What's the probability she actually has cancer?
When David Eddy posed this question to physicians in 1982, 95 out of 100 estimated the probability at around 75%.3 The actual answer is approximately 9%.
Let's count again, Gigerenzer-style. Take 1,000 women:
Nine percent. Not seventy-five. The doctors were off by a factor of eight. And these were physicians making clinical decisions based on exactly this kind of reasoning.
The culprit is the same: base rate neglect. Cancer is rare (1%), so the pool of cancer-free women is enormous. Even a small false positive rate applied to an enormous pool generates a flood of false alarms that drowns out the real signal.
Why This Matters: From Airports to Courtrooms
Scale the rarity up and the problem gets absurd. Suppose the TSA installs a terrorist-detecting scanner that's 99.9% accurate — it catches 99.9% of terrorists and falsely flags only 0.1% of innocent people. Sounds incredible, right?
Now run a million passengers through it. If one of them is a terrorist:4
Your 99.9% accurate detector produces 1,001 alerts, and 1,000 of them are innocent people missing their flights. The positive predictive value — the chance that a flagged person is actually dangerous — is about one-tenth of one percent.
The Prosecutor's Fallacy
In courtrooms, base rate neglect wears a suit and tie and calls itself the prosecutor's fallacy. It goes like this: "The probability of this DNA match occurring by chance is one in a million. Therefore, there's only a one-in-a-million chance the defendant is innocent."
Do you see the sleight of hand? The prosecutor has confused P(evidence | innocence) with P(innocence | evidence). These are not the same thing — they're related by Bayes' theorem, and the base rate is what connects them.5
If you tested 10 million people from a city to find your suspect, you'd expect about 10 people to match that one-in-a-million DNA profile. Your suspect is one of ten, not one of one. The probability of innocence is more like 90% — not one in a million. The base rate (how many people were in the pool of possible suspects) matters enormously, and ignoring it has sent innocent people to prison.
COVID and the Testing Paradox
The pandemic gave everyone a crash course in base rates, whether they wanted one or not. Early in 2020, when COVID prevalence in most areas was below 1%, mass testing produced a flood of false positives.6 Consider a rapid antigen test with 90% sensitivity and 98% specificity — solid numbers by any standard — and test 100,000 people in a town where only 200 actually have COVID. You get roughly 180 true positives but also 1,996 false positives. More than 90% of positive results are wrong.
As the pandemic surged and prevalence climbed to 10%, 20%, 30%, the same test became dramatically more reliable. At 20% prevalence, the positive predictive value jumps to about 92%. The test didn't change. The base rate did.
Similarly, workplace drug testing programs face the same arithmetic. If only 2% of employees use drugs and your test is 95% accurate — 95% sensitivity and 95% specificity — most positive results will be false positives. Out of 1,000 employees, 20 use drugs and 19 test positive; 980 are clean but 49 test positive anyway. That's 19 real positives swimming in a pool of 68 total positives — a PPV of just 28%. Careers have ended over that coin flip.7
The accuracy of a test is not the same as the accuracy of its results. A test's positive predictive value depends on three things: sensitivity (true positive rate), specificity (1 minus false positive rate), and — crucially — the base rate of the condition in the population. Leave out the base rate and you'll be wrong most of the time about rare things.
Build Your Intuition
The best cure for base rate neglect is practice. The calculator below lets you explore how prevalence, sensitivity, and false positive rate interact to determine the positive predictive value. Play with the sliders. Watch the grid. Develop a gut feeling for when rare events make tests unreliable.
Base Rate Calculator
Adjust the sliders to see how prevalence, sensitivity, and false positive rate affect the positive predictive value.
Test Your Bayesian Instincts
Now that you've seen the math, let's see if your gut has caught up. The game below presents scenarios with different base rates and test accuracies. Guess the probability that a positive result is correct, then see how close you were to the Bayesian answer.
The Cab Problem Game
You'll see a scenario. Estimate the probability that a positive result is truly positive. Then see the real answer.
Why Your Brain Does This
There's a reason base rate neglect is so persistent: it's baked into our cognitive architecture. Kahneman and Tversky called it the representativeness heuristic — we judge probabilities by how much something "looks like" the category in question.8 The witness said Blue, and Blue looks like Blue, so we go with Blue. The mammogram said cancer, and cancer matches cancer, so we think cancer.
The base rate, on the other hand, is abstract, statistical, and boring. It's the background hum of the world — 85% of cabs are Green, 99% of people are healthy, 999,999 out of a million aren't terrorists. It doesn't tell a story. It doesn't grab attention. And so our brains just… skip it.
Gigerenzer's great insight was that this isn't an unfixable bug in human cognition. It's a formatting problem. When you translate percentages into natural frequencies — "12 out of 29" instead of "multiply the prior by the likelihood ratio and normalize" — people's accuracy jumps dramatically. In his studies, presenting problems in frequency format boosted correct answers from about 10% to about 50%.2
The lesson isn't that humans are hopelessly irrational. It's that we're rational in the right format. Evolution built us to count things we've actually encountered, not to multiply abstract decimals. So whenever you face a question about rare events and imperfect tests, do yourself a favor: stop thinking in percentages. Start counting.
Pick a population. A thousand people, a hundred cabs, a million airline passengers. Run the numbers forward. Count the true positives, count the false positives, and only then ask: of everyone who tested positive, how many actually are?
You'll be surprised how often the answer is "not many." And you'll never look at a test result the same way again.
The Paradox of Better Tests
Here's one more twist that's worth sitting with, because it's genuinely strange. Making a test more accurate doesn't always help as much as you'd think. Suppose you improve your mammogram from 90% sensitivity to 99% sensitivity. That's a huge engineering achievement! But if you keep the same 9% false positive rate and the same 1% prevalence, your PPV only climbs from 9.2% to 10.1%. You spent millions on better detection technology, and you moved the needle by one percentage point.
The real leverage, it turns out, is on the other side — reducing false positives. Drop the false positive rate from 9% to 1%, and the PPV jumps from 9.2% to 47.6%. Or do what smart clinicians actually do: don't test everyone. Test people who already have elevated risk — family history, symptoms, age cohorts where prevalence is 5% instead of 1%. Now your 90%-sensitivity mammogram has a PPV of 34%, even with the old false positive rate. You didn't improve the test at all. You improved the population you gave the test to. You raised the base rate.
This is why your doctor doesn't order every test in the catalog for your annual physical. It's not laziness, and it's not cost-cutting (well, not only cost-cutting). It's Bayes' theorem whispering in their ear: if you go looking for rare things in healthy people, you will find trouble — and most of it will be imaginary.
Ellenberg's central thesis in How Not to Be Wrong is that mathematics isn't an abstract game played by professors with chalk dust on their blazers — it's the extension of common sense by other means. Base rate neglect is the perfect case study. The math here isn't hard. It's multiplication and division. The hard part is remembering to do it at all, because your brain is busy telling you a vivid story about a witness who saw a Blue cab, or a test that came back positive, or a scanner that beeped. Stories are powerful. But the base rate is patient. It sits there in the denominator, waiting for you to notice it.
And if you don't notice it? Well — that's how not to be right.