All Chapters

The Missing Chapter

Correlation vs Causation

The most abused phrase in science — and the surprisingly deep mathematics of what actually causes what

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Here's a true fact: between 1999 and 2009, the number of films Nicolas Cage appeared in correlated almost perfectly with the number of people who drowned by falling into a swimming pool. The correlation coefficient was 0.67. Another true fact: U.S. spending on science, space, and technology correlated with suicides by hanging, strangling, and suffocation at r = 0.99.1 Should we defund NASA to save lives?

Tyler Vigen, a Harvard Law student with a sense of humor and a knack for data mining, built a website called Spurious Correlations that's full of these gems. Per capita cheese consumption tracks with the number of people who die tangled in their bedsheets. The age of Miss America correlates with murders by steam, hot vapors, and hot objects. Each one is technically true — the numbers really do move together — and each one is obviously, hilariously meaningless.

Everyone laughs. Everyone nods. "Correlation doesn't imply causation," they say, as if reciting a catechism. And then they go right back to confusing the two in every other context. Because here's the thing about "correlation doesn't imply causation" — it's simultaneously the most quoted and least understood idea in all of statistics.

Chapter 64

The Three Suspects

When two things are correlated — let's call them A and B — there are really only three possible explanations. Just three. This is worth committing to memory, because it's the skeleton key to half the bad arguments you'll encounter in your life:

1. A causes B. The obvious one. Smoking causes cancer. Studying causes better grades. Simple, clean, satisfying.

2. B causes A. The sneaky reverse. Maybe it's not that ice cream causes drowning — it's that summer causes both ice cream eating and swimming, and more swimming means more drowning. But sometimes the reversal is the real story: maybe it's not that good grades cause studying, but that confidence from good grades causes more studying.

3. C causes both A and B. The confound. Something you haven't measured — something lurking in the background — is pulling both strings. This is the explanation that keeps epidemiologists up at night.

There's technically a fourth option — pure coincidence — which is what's happening with Nicolas Cage and swimming pools. Given enough variables and enough time, some pairs will correlate just by chance. This is the garden of forking paths, and it's where the spurious correlation generator below comes from.

A B A causes B A B B causes A C A B C causes both

The three possible explanations for any observed correlation between A and B.

• • •

The Deadliest Abuse

In 1950, the British researchers Richard Doll and Austin Bradford Hill published a landmark study in the British Medical Journal.2 They had compared the habits of lung cancer patients with matched controls and found something striking: lung cancer patients were overwhelmingly more likely to be smokers. The correlation was enormous.

The tobacco industry's response was swift and devastating in its cleverness. They didn't deny the correlation. They just kept repeating, with a kind of patient condescension: "Correlation does not imply causation."

And they were technically correct. The data showed that smokers got more cancer. But that alone couldn't prove smoking caused cancer. Maybe sick people smoked more to soothe themselves (B causes A). Maybe some genetic factor made people both crave nicotine and develop cancer (C causes both). These weren't crazy ideas — they were genuine logical possibilities.

Here's where it gets really wild. R.A. Fisher — the same Fisher who essentially invented modern statistics, the analysis of variance, the randomized controlled trial — took the tobacco industry's side.3 Fisher, a pipe smoker himself, argued passionately that the correlation between smoking and cancer might be explained by a shared genetic cause. He wasn't a shill; he genuinely believed it. But he was spectacularly, consequentially wrong.

"Is it possible that lung cancer — that is to say, the pre-cancerous condition which must gruesome — is one of the causes of smoking cigarettes? I don't think it can be excluded."
— R.A. Fisher, 1958

Fisher's argument was logically valid but practically insane. The lesson isn't that Fisher was dumb — the man was a genius. The lesson is that "correlation doesn't imply causation" can be wielded as a weapon to block any conclusion from observational data. Taken to its extreme, it's an argument for never believing anything you can't test in a lab. And you can't put ten thousand people in a lab and force half of them to smoke for thirty years.

So how do you figure out what causes what when experiments are impossible?

• • •

Try It Yourself: The Spurious Correlation Machine

Before we answer that question, let's build some intuition for how easy it is to find correlations in random noise. The generator below creates two completely random, unrelated time series — and then hunts for the pair with the highest correlation. Hit "Generate" and watch meaningless patterns emerge.

Spurious Correlation Generator

Two random time series with zero causal connection. Hit generate to find deceptively correlated random walks.

Number of random series to search 100
Data points per series 15
Series A vs Series B
Correlation (r)
p-value
Pairs searched

Why this matters

With enough random variables, you will always find strong correlations. This is why data dredging — testing every possible combination and reporting only the hits — produces "findings" that never replicate. The more you search, the more ghosts you find.

• • •

Bradford Hill's Ladder

Austin Bradford Hill — Doll's collaborator and arguably the father of the clinical trial — knew that "correlation isn't causation" couldn't be the end of the story. In 1965, he delivered a presidential address to the Royal Society of Medicine that laid out nine criteria for moving from correlation to causation.4 These aren't mathematical proofs. They're more like a checklist for building a circumstantial case — the kind of case that might convict in a court of law, even without a confession.

1. Strength. How big is the effect? Smokers were 9–10 times more likely to get lung cancer. That's not a subtle signal.

2. Consistency. Has it been seen repeatedly, by different researchers, in different places? The smoking-cancer link showed up everywhere anyone looked.

3. Specificity. Is the exposure associated with a specific disease? Smoking hit the lungs especially hard — not everything equally.

4. Temporality. Does the cause come before the effect? This is the one non-negotiable criterion. People smoked before they got cancer.

5. Biological gradient. Does more exposure mean more effect? More cigarettes per day meant more cancer. A dose-response curve is powerful evidence.

6. Plausibility. Is there a mechanism that makes biological sense? We now know exactly how tobacco smoke damages DNA.

7. Coherence. Does the causal interpretation fit with everything else we know? The rise in lung cancer tracked the rise in smoking decades earlier.

8. Experiment. Does removing the cause reduce the effect? People who quit smoking saw their cancer risk decline.

9. Analogy. Are there similar cause-effect relationships already established? Other carcinogens were already known.

Notice what Hill didn't do: he didn't demand a randomized controlled trial. He couldn't — you can't ethically randomize people to smoke. Instead, he built such an overwhelming web of evidence that the alternative explanations (Fisher's genetic hypothesis, the tobacco industry's "stress" hypothesis) became untenable. Not impossible, just absurdly unlikely.

This is how science actually works most of the time: not with a single decisive experiment, but with a gradual accumulation of evidence from multiple imperfect sources, each one making the alternatives a little less plausible.

Strength Consistency Specificity Temporality Gradient Plausibility Coherence Experiment Confidence in causation Bradford Hill's Criteria

Each criterion you satisfy climbs another rung toward causal confidence. No single rung is sufficient — but together they build an overwhelming case.

• • •

The Original Natural Experiment

In 1854, London was dying. Cholera swept through Soho, killing hundreds in days. The prevailing theory was "miasma" — bad air from rotting organic matter. John Snow, a physician with an obsessive attention to mapping, didn't buy it.5

Snow noticed something peculiar. Two water companies served overlapping neighborhoods in South London: the Lambeth Company and the Southwark & Vauxhall Company. Their pipes ran down the same streets. Neighbors — literally people living next door to each other — got their water from different sources. But the Southwark & Vauxhall customers were dying at a rate fourteen times higher.

This was, in effect, a natural experiment. The "treatment" (dirty vs. clean water) had been assigned by the accident of which company serviced your house — essentially at random, at least with respect to the things that might confound the analysis. Snow couldn't randomize people into water groups, but history had done it for him.

Snow's Broad Street investigation is usually told as the story where he removed a pump handle and stopped an epidemic. The real story is subtler and more important: he used the structure of the situation — the natural variation in water sources — to tease apart cause from correlation. He couldn't do an experiment, but he found something almost as good.

• • •

The Causal Revolution

For most of the twentieth century, statistics had a dirty secret: it couldn't really talk about causation. Seriously. The field that everyone assumed was for figuring out what causes what had essentially banned the word "cause" from its vocabulary. Statisticians could tell you that X and Y were "associated" or "correlated," but the machinery of the discipline — regressions, p-values, confidence intervals — couldn't formally distinguish "A causes B" from "B causes A" from "C causes both."6

Judea Pearl changed this. A computer scientist by training (which may be why he wasn't bound by the statistical establishment's taboos), Pearl developed a formal mathematical language for causation.7 His key insight was deceptively simple: draw a picture.

Genetics Smoking Tar Cancer Fisher's hypothesis

A directed acyclic graph (DAG) for the smoking–cancer debate. The dashed arrow represents Fisher's genetic confounding hypothesis. Pearl's framework lets us test which structure fits the data.

Pearl's directed acyclic graphs (DAGs) are pictures of your causal assumptions. Each arrow means "causes." Once you draw the picture, his do-calculus tells you — mechanically, provably — whether and how you can estimate causal effects from observational data. Sometimes you can. Sometimes you can't. But now you know which one it is, and that's a revolution.

The key operation is the "do" operator. There's a crucial difference between P(cancer | smoking) — the probability of cancer given that we observe someone smoking — and P(cancer | do(smoking)) — the probability of cancer if we were to make someone smoke. The first is a correlation. The second is a causal effect. They're the same only when there's no confounding.8

Pearl's work connects to a parallel tradition in economics and social science called the "potential outcomes framework," developed by Donald Rubin. The idea is to think about causation in terms of counterfactuals: what would have happened to this specific person if they hadn't smoked? You can never observe both realities for the same person — this is the "fundamental problem of causal inference" — but clever experimental designs and statistical techniques can estimate the average difference across populations.

• • •

Test Your Causal Intuition

Here's your chance to practice. For each real-world correlation below, identify the most likely causal story. Then check the Bradford Hill criteria to see how confident we should be.

Causal Reasoning Quiz

For each correlation, pick the most likely explanation.

Bradford Hill Checklist

Pick any claim and mentally evaluate it against the criteria:

Check criteria above to assess causal confidence.
• • •

Correlation Is a Hint

John Tukey, one of the great statisticians of the twentieth century, put it best: "Correlation doesn't imply causation, but it sure is a hint."

This is the mature position — the one that avoids both naive credulity ("these two things correlate, so one must cause the other!") and nihilistic skepticism ("you can never prove causation from observation, so why bother?"). Correlation is evidence. It's not proof, but it's not nothing. The question isn't whether to take correlation seriously — it's how seriously, and what other evidence you need to build the case.

Sometimes correlation really is your best evidence. We'll never run a randomized trial on whether climate change causes extreme weather events — we've only got one planet, and we can't rerun history with less CO₂. We'll never randomize countries into democracy vs. autocracy to see which produces better outcomes. For the biggest questions — the ones that matter most — we often have to reason from observational data, armed with clever designs, careful thinking, and Bradford Hill's checklist.

The tobacco industry understood this perfectly. They knew that demanding experimental proof for every causal claim was functionally equivalent to denying all causal claims from observational data. "Correlation doesn't imply causation" became a shield behind which they hid for decades while millions died.

The deepest lesson of this chapter isn't a statistical technique. It's a moral one. The question "how do we know what causes what?" isn't just an intellectual puzzle. It's a question with stakes — sometimes life-and-death stakes. Getting the epistemology right matters. And sometimes the most dangerous thing you can say isn't "this causes that" without evidence — it's "you haven't proved it" when the evidence is already overwhelming.

Correlation doesn't imply causation. But causation does imply correlation. And a whole lot of correlation, from a whole lot of angles, with a plausible mechanism and a dose-response curve and the right temporal order? That's not proof. But it's enough to act on. And sometimes, acting is the ethical thing to do.

Notes & References

  1. Tyler Vigen, Spurious Correlations (New York: Hachette Books, 2015). The website tylervigen.com/spurious-correlations contains hundreds of these examples.
  2. Richard Doll and A. Bradford Hill, "Smoking and Carcinoma of the Lung," British Medical Journal 2, no. 4682 (1950): 739–748.
  3. R.A. Fisher, "Lung Cancer and Cigarettes?" Nature 182 (1958): 108. Fisher's contrarian stance is examined in Paul Stolley, "When Genius Errs: R.A. Fisher and the Lung Cancer Controversy," American Journal of Epidemiology 133, no. 5 (1991): 416–425.
  4. Austin Bradford Hill, "The Environment and Disease: Association or Causation?" Proceedings of the Royal Society of Medicine 58, no. 5 (1965): 295–300.
  5. Steven Johnson, The Ghost Map: The Story of London's Most Terrifying Epidemic (New York: Riverhead Books, 2006). For Snow's original work, see John Snow, On the Mode of Communication of Cholera, 2nd ed. (London: John Churchill, 1855).
  6. Judea Pearl recounts this history in The Book of Why: The New Science of Cause and Effect (New York: Basic Books, 2018), particularly Chapter 1.
  7. Judea Pearl, Causality: Models, Reasoning, and Inference, 2nd ed. (Cambridge: Cambridge University Press, 2009).
  8. The do-calculus is formally presented in Pearl (2009), Chapter 3. An accessible introduction is in Pearl and Mackenzie, The Book of Why, Chapters 7–8.