The Aspirin Problem
Here is the hardest problem in all of science, and you encounter it every time you take a pill.
You wake up with a headache. You swallow two aspirin. An hour later, the headache is gone. Did the aspirin cure it?
You want to say yes. Of course it did — that's why you took it. But think for a moment about what you'd need to know to be sure. You'd need to know what would have happened if you hadn't taken the aspirin. Maybe the headache would have gone away on its own. Maybe it was already fading when you reached for the bottle. Maybe the placebo effect of swallowing something did the work. The aspirin gets the credit, but the counterfactual — the world in which you didn't take it — is a world you never get to visit.
This isn't a philosophical quibble. It is, in a precise technical sense, the central obstacle to all of human knowledge about cause and effect. Statisticians call it the fundamental problem of causal inference, and it goes like this: for any individual, you can only ever observe one version of reality. The treated version or the untreated version. Never both.1
Potential Outcomes, or: The Ghost Worlds
In the 1970s, the statistician Donald Rubin formalized this intuition into what's now called the potential outcomes framework.2 The idea is deceptively simple. For every person in a study, there exist two potential outcomes:
The causal effect of aspirin on your headache is Y(1) − Y(0): the difference between your pain level with aspirin and your pain level without. Simple, right? Except for the tiny problem that you can never, ever observe both numbers for the same person at the same time. One of them is always a ghost — a counterfactual that exists in theory but not in data.
This is not a problem of measurement. It's not that our instruments aren't good enough. It's a logical impossibility, baked into the structure of time itself. You can't simultaneously take the aspirin and not take the aspirin. The Holland dictum says it plainly: no causation without manipulation — and no individual causal effect without observing both manipulations, which you can't.3
The Randomization Trick
So individual causal effects are unknowable. That sounds like it should be the end of the story. But here is where mathematics pulls off one of its best tricks.
We can't know your causal effect. But we can know the average causal effect across a group. The trick is randomization.
Take a thousand people with headaches. Flip a coin for each one: heads, you get aspirin; tails, you get a sugar pill. Because assignment is random, the treated group and the control group are, on average, identical in every way — age, severity, genetics, breakfast choices, everything. The only systematic difference between them is the aspirin.
Now compare the average outcome in each group. The difference is the Average Treatment Effect (ATE):
This is the miracle of the randomized controlled trial — the RCT. It doesn't solve the fundamental problem for any individual. It sidesteps it by exploiting the law of large numbers. The ghosts are still there, but they cancel out on average.4
Try it yourself. The machine below generates a small population where — because we're playing God — you can see both potential outcomes for every person. Real researchers never get this view. Then we randomize, hide the counterfactuals, and see how close the estimated treatment effect gets to the truth.
The Counterfactual Machine
God's view: see both potential outcomes. Researcher's view: see only the observed one. Randomize and compare.
| Person | Y(0) | Y(1) | Effect | Group | Observed |
|---|
When You Can't Randomize
RCTs are wonderful. They're also frequently impossible.
Want to know if smoking causes cancer? You can't randomly assign half the population to smoke for thirty years. Want to know if going to college increases earnings? You can't randomly send some teenagers to Harvard and others to work at the gas station. Want to know if democracy promotes economic growth? You definitely can't randomly install governments.
For most of the big questions in the social sciences — and many in medicine — randomization is off the table. Ethics forbids it, logistics prevent it, or the treatment is something no researcher controls. What then?
Then you get clever. Over the past fifty years, economists and statisticians have developed an extraordinary toolkit for extracting causal answers from observational data — data where nature, policy, or accident did the randomizing for you. These methods earned Joshua Angrist, David Card, and Guido Imbens the 2021 Nobel Prize in Economics, and they've quietly revolutionized how we think about evidence.5
Instrumental Variables: Find something (an "instrument") that affects treatment but has no direct effect on the outcome. The Vietnam draft lottery randomly assigned draft numbers to men — affecting whether they served in the military, but not directly affecting their later earnings. Angrist used this to estimate the causal effect of military service on civilian wages.6
Regression Discontinuity: When treatment is assigned by a sharp cutoff — you get the scholarship if your score is above 80, say — people just above and just below the cutoff are essentially identical. Comparing their outcomes gives you a causal effect, because near the threshold, assignment is as-good-as-random.
Difference-in-Differences: Compare the change over time in a treated group to the change over time in a control group. David Card used this to study the effect of a minimum wage increase in New Jersey, using Pennsylvania as a control. If both states were trending the same way before the policy change, the divergence afterward is the causal effect.
Explore how each method works in the interactive below.
Natural Experiment Toolkit
Three methods for finding causal effects in observational data. Each tab shows a simulated dataset and how the method isolates the treatment effect.
The Vietnam draft lottery randomly assigned draft numbers to young men. Low numbers → likely drafted → military service. We use the lottery number as an instrument to estimate how military service affected later earnings — without confounders like patriotism or socioeconomic background.
Students scoring ≥80 on an exam receive a scholarship. Students at 79 and 81 are nearly identical — but one gets funding and the other doesn't. The jump in outcomes at the cutoff reveals the causal effect of the scholarship.
State A raises its minimum wage; State B doesn't. By comparing the change in employment in State A versus the change in State B, we remove shared trends and isolate the policy's causal effect.
The Language of Causation
Rubin's framework is powerful, but it has a limitation: it thinks in terms of treatments and outcomes, which works perfectly for clinical trials and policy evaluations. But what about the bigger questions? What if you want to reason about entire networks of causes — does education affect health, or does health affect education, or does some third thing (wealth, say) drive both?
Enter Judea Pearl, the computer scientist who decided that what causation needed was its own algebra.7
Pearl's insight was that causal relationships can be represented as directed acyclic graphs — DAGs — where arrows point from causes to effects. And once you have the graph, you can use a formal operation called the do-operator to distinguish between seeing and doing.
P(Y | X) is the probability of Y given that you observe X. P(Y | do(X)) is the probability of Y given that you set X by intervention. These are different things. When you see that people who carry lighters have higher rates of lung cancer, P(cancer | lighter) is elevated — but P(cancer | do(lighter)) is not. The lighter doesn't cause cancer; smoking causes both the lighter-carrying and the cancer. Pearl's do-calculus gives you the mathematical machinery to compute the interventional probability from observational data, when the causal graph allows it (see Chapter 64).
Simpson's Paradox Revisited
Perhaps the most vivid demonstration of why you need causal thinking — not just statistical thinking — is Simpson's paradox (see Chapter 10). A treatment can appear to help in every subgroup but hurt overall, or vice versa. The classic case: a drug helps both men and women separately, but when you combine the data, it seems to hurt everyone.
The purely statistical mind is paralyzed. Should you combine or separate? There's no statistical answer. The answer depends on the causal structure. If gender is a confounder (it affects both who gets treatment and the outcome), you should separate. If gender is a mediator (treatment affects outcomes through gender-related pathways), you should combine. Same data, different causal stories, opposite conclusions.8
Data alone cannot tell you what causes what. You need a model — an assumption, a story, a graph — about how the world works. The data then tells you whether your story is consistent with reality. But the story always comes first. This is uncomfortable for people who want to "let the data speak for itself." The data doesn't speak. It answers questions, and the quality of the answer depends entirely on the quality of the question.
The Nobel and the Future
When the Royal Swedish Academy awarded the 2021 Nobel Memorial Prize in Economic Sciences to Card, Angrist, and Imbens, it was a recognition that the "credibility revolution" in economics — the shift from hand-wavy regression to careful causal identification — had fundamentally changed how we learn from data.
Card's work on minimum wages used difference-in-differences to show, contrary to textbook predictions, that modest minimum wage increases didn't necessarily destroy jobs. Angrist used the draft lottery as an instrumental variable to study the returns to military service and, later, to education. Imbens developed the statistical theory that made these methods rigorous — in particular, the Local Average Treatment Effect (LATE), which clarifies exactly whose causal effect you're estimating when you use an instrument.
These aren't just academic advances. They're how we now evaluate whether a policy works, whether a medical intervention helps, whether an educational program succeeds. Every time a government asks "did this program actually cause the improvement we're seeing?" the answer involves one of these methods.
And the frontier keeps moving. Machine learning is being fused with causal inference to estimate heterogeneous treatment effects — not just "does the drug work on average?" but "for whom does it work best?" Researchers are building algorithms that discover causal structure from data, automatically mapping which variables cause which. The dream of a fully automated causal scientist is still far off, but the tools get sharper every year.
The aspirin in your medicine cabinet doesn't know any of this. It just dissolves. But the confidence you have that it works — that it causes your headache to go away, rather than merely being correlated with its departure — that confidence rests on a century of mathematical thought about the hardest question in science. And the next time someone tells you "correlation doesn't imply causation," you can nod wisely and add: "But with the right design, you can get pretty close."