The Blue Cab Problem
A cab was involved in a hit-and-run accident at night. Two cab companies operate in the city: the Green company, which owns 85% of the cabs, and the Blue company, which owns the other 15%. A witness identified the cab as Blue. The court tested the witness's ability to distinguish Blue from Green cabs under nighttime conditions and found she was correct 80% of the time.
Quick — what's the probability the cab was actually Blue?
If you said 80%, you're in excellent company. You're also wrong. Most people — and by "most people" I include judges, lawyers, and a disturbing number of statisticians answering in a hurry — give 80%.1 The witness is 80% reliable, so there's an 80% chance she's right. What could be simpler?
The answer is about 41%. And understanding why it's 41% is one of the most important things you can do with your brain.
The mistake has a name: base rate neglect. Your brain heard "80% reliable" and ran with it, completely ignoring the other crucial piece of information — that 85% of the cabs in the city are Green. The base rate, the background prevalence, the thing that was true before any evidence showed up, got left on the cutting room floor of your mental processing.
Thinking in People, Not Percentages
Let me show you why the answer is 41%, and I'm going to do it the way the cognitive scientist Gerd Gigerenzer argues we should always do it — with natural frequencies instead of probabilities.2 Our brains evolved to count things, not to manipulate decimals. So let's count.
Imagine 100 cabs. 85 are Green, 15 are Blue. Now the witness looks at each one:
Of the 85 Green cabs, she correctly calls 80% of them Green. That's 68. But she incorrectly calls 20% of them Blue. That's 17 Green cabs she says are Blue.
Of the 15 Blue cabs, she correctly calls 80% of them Blue. That's 12 Blue cabs she says are Blue. She misidentifies the other 3 as Green.
So when she says "Blue," there are 12 + 17 = 29 cabs she'd call Blue. Only 12 of those are actually Blue.
12 out of 29 ≈ 41%.
Notice what happened. There are so many Green cabs that even though she's only wrong 20% of the time about them, those mistakes pile up. The 17 false "Blues" swamp the 12 genuine Blues. The base rate — 85% Green — does the heavy lifting in the final answer, but our brains refuse to give it the weight it deserves.
The Mammogram Problem
If the cab problem were only about cabs, we could file it under "interesting cocktail party trivia" and move on. But the same error shows up everywhere, and in some places it's genuinely dangerous.
Here's the version that keeps epidemiologists up at night:
The prevalence of breast cancer in women undergoing routine screening is about 1%. If a woman has breast cancer, a mammogram will correctly detect it 90% of the time (sensitivity). If she does not have cancer, the mammogram will still come back positive about 9% of the time (false positive rate).
A woman gets a positive mammogram. What's the probability she actually has cancer?
When David Eddy posed this question to physicians in 1982, 95 out of 100 estimated the probability at around 75%.3 The actual answer is approximately 9%.
Let's count again, Gigerenzer-style. Take 1,000 women:
Nine percent. Not seventy-five. The doctors were off by a factor of eight. And these were physicians making clinical decisions based on exactly this kind of reasoning.
The culprit is the same: base rate neglect. Cancer is rare (1%), so the pool of cancer-free women is enormous. Even a small false positive rate applied to an enormous pool generates a flood of false alarms that drowns out the real signal.
Why This Matters: From Airports to Courtrooms
Scale the rarity up and the problem gets absurd. Suppose the TSA installs a terrorist-detecting scanner that's 99.9% accurate — it catches 99.9% of terrorists and falsely flags only 0.1% of innocent people. Sounds incredible, right?
Now run a million passengers through it. If one of them is a terrorist:4
Your 99.9% accurate detector produces 1,001 alerts, and 1,000 of them are innocent people missing their flights. The positive predictive value — the chance that a flagged person is actually dangerous — is about one-tenth of one percent.
The Prosecutor's Fallacy
In courtrooms, base rate neglect wears a suit and tie and calls itself the prosecutor's fallacy. It goes like this: "The probability of this DNA match occurring by chance is one in a million. Therefore, there's only a one-in-a-million chance the defendant is innocent."
Do you see the sleight of hand? The prosecutor has confused P(evidence | innocence) with P(innocence | evidence). These are not the same thing — they're related by Bayes' theorem, and the base rate is what connects them.5
If you tested 10 million people from a city to find your suspect, you'd expect about 10 people to match that one-in-a-million DNA profile. Your suspect is one of ten, not one of one. The probability of innocence is more like 90% — not one in a million. The base rate (how many people were in the pool of possible suspects) matters enormously, and ignoring it has sent innocent people to prison.
COVID and the Testing Paradox
The pandemic gave everyone a crash course in base rates, whether they wanted one or not. Early in 2020, when COVID prevalence in most areas was below 1%, mass testing produced a flood of false positives.6 A rapid antigen test with 98% specificity sounds great — until you test 100,000 people in a town where only 200 actually have COVID. You get roughly 180 true positives and 1,996 false positives. More than 90% of positive results are wrong.
As the pandemic surged and prevalence climbed to 10%, 20%, 30%, the same test became dramatically more reliable. At 20% prevalence, the positive predictive value jumps to about 92%. The test didn't change. The base rate did.
Similarly, workplace drug testing programs face the same arithmetic. If only 2% of employees use drugs and your test is 95% accurate, most positive results will be false positives — a fact that has ended careers unjustly.7
The accuracy of a test is not the same as the accuracy of its results. A test's positive predictive value depends on three things: sensitivity (true positive rate), specificity (1 minus false positive rate), and — crucially — the base rate of the condition in the population. Leave out the base rate and you'll be wrong most of the time about rare things.
Build Your Intuition
The best cure for base rate neglect is practice. The calculator below lets you explore how prevalence, sensitivity, and false positive rate interact to determine the positive predictive value. Play with the sliders. Watch the grid. Develop a gut feeling for when rare events make tests unreliable.
Base Rate Calculator
Adjust the sliders to see how prevalence, sensitivity, and false positive rate affect the positive predictive value.
Test Your Bayesian Instincts
Now that you've seen the math, let's see if your gut has caught up. The game below presents scenarios with different base rates and test accuracies. Guess the probability that a positive result is correct, then see how close you were to the Bayesian answer.
The Cab Problem Game
You'll see a scenario. Estimate the probability that a positive result is truly positive. Then see the real answer.
Why Your Brain Does This
There's a reason base rate neglect is so persistent: it's baked into our cognitive architecture. Kahneman and Tversky called it the representativeness heuristic — we judge probabilities by how much something "looks like" the category in question.8 The witness said Blue, and Blue looks like Blue, so we go with Blue. The mammogram said cancer, and cancer matches cancer, so we think cancer.
The base rate, on the other hand, is abstract, statistical, and boring. It's the background hum of the world — 85% of cabs are Green, 99% of people are healthy, 999,999 out of a million aren't terrorists. It doesn't tell a story. It doesn't grab attention. And so our brains just… skip it.
Gigerenzer's great insight was that this isn't an unfixable bug in human cognition. It's a formatting problem. When you translate percentages into natural frequencies — "12 out of 29" instead of "multiply the prior by the likelihood ratio and normalize" — people's accuracy jumps dramatically. In his studies, presenting problems in frequency format boosted correct answers from about 10% to about 50%.2
The lesson isn't that humans are hopelessly irrational. It's that we're rational in the right format. Evolution built us to count things we've actually encountered, not to multiply abstract decimals. So whenever you face a question about rare events and imperfect tests, do yourself a favor: stop thinking in percentages. Start counting.
Pick a population. A thousand people, a hundred cabs, a million airline passengers. Run the numbers forward. Count the true positives, count the false positives, and only then ask: of everyone who tested positive, how many actually are?
You'll be surprised how often the answer is "not many." And you'll never look at a test result the same way again.