All Chapters

The Missing Chapter

The Will Rogers Phenomenon

How moving patients between groups "cures" cancer — without helping a single person

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 34

The Comedian's Theorem

Will Rogers, the Oklahoma-born humorist, supposedly quipped: "When the Okies left Oklahoma and moved to California, they raised the average intelligence in both states." It's a good joke. It's also a theorem.

Think about it for a second. How could migration raise the average on both sides? If the people leaving Oklahoma were below Oklahoma's average but above California's, then removing them boosts Oklahoma's average, and adding them boosts California's. The joke works because it implies Oklahomans are dumb — but the math works for any distribution where the migrants sit between the two means.

Rogers — who was part Cherokee, never finished high school, and became the highest-paid actor in Hollywood — had a genius for dressing up deep truths as throwaway gags. Whether he actually said this particular line is debated; the attribution is apocryphal, which is fitting for a phenomenon all about things that look real but aren't. What matters is the structure of the joke. It's not really about Oklahoma or California. It's about what happens to averages when you move things between groups.

This isn't a paradox. It's arithmetic. And it kills people — or rather, it makes us think we're saving them when we're not.

· · ·

The Setup: Two Buckets and a Borderline Case

Let's make this concrete. Imagine two groups of patients. Group A — the "mild" cases — has a survival score of, say, 2, 3, 4, 5, 6. Average: 4.0. Group B — the "severe" cases — has scores of 7, 8, 9, 10, 11. Average: 9.0.

Now suppose you get a better diagnostic test, and it reveals that the patient scoring 7 was misclassified. They're actually mild! You move them from Group B to Group A.

Group A is now {2, 3, 4, 5, 6, 7}. New average: 4.5. Up.

Group B is now {8, 9, 10, 11}. New average: 9.5. Also up.

Both groups improved. Nobody got any healthier. You just shuffled a card from one hand to the other.

This should bother you. We make decisions — life-and-death decisions, billion-dollar decisions — based on averages all the time. Hospital administrators look at average survival rates to judge whether treatments are working. School boards look at average test scores to judge whether reforms are succeeding. Corporate boards look at average revenue per division to judge whether restructuring paid off. And in every one of these cases, the averages can improve everywhere while the underlying reality doesn't budge.

The trick is always the same: find someone who's the worst member of a strong group but would be the best member of a weak group. Move them. Both averages go up. Nothing else changes. It's a magic trick where the magician doesn't even know they're performing.

Group A (Mild) 2 3 4 5 6 avg = 4.0 Group B (Severe) 7 8 9 10 11 avg = 9.0 reclassify 7 →
Moving the weakest "severe" patient to the "mild" group raises both averages.
· · ·
Chapter 34

When Better Diagnosis Looks Like Better Treatment

In 1985, Alvan Feinstein, David Sosin, and Carolyn Wells published a paper in the New England Journal of Medicine that should have been a bombshell.1 They had noticed something disturbing: over the previous decade, survival rates for nearly every stage of lung cancer had gone up. Wonderful news, except that the overall survival rate — all stages combined — hadn't budged.

How does every category improve while the total stays flat? Stage migration.

Here's what happened. In the 1970s, oncologists got access to CT scans, which could detect tiny metastases that older X-rays missed. Patients who would previously have been classified as Stage III (locally advanced) were now found to have small distant tumors — reclassifying them as Stage IV (metastatic).

Think about what this means from the patient's perspective. You walk into an oncologist's office in 1975. They take an X-ray, see a lung tumor, find no obvious spread. "Stage III," they say. You have a certain prognosis. Now rewind the tape. Same patient, same tumor, same biology — but it's 1983. They do a CT scan. They spot a tiny nodule in your liver that the 1975 X-ray would have missed entirely. "Stage IV," they say. Your prognosis hasn't changed — that nodule was always there — but your label has.

And here's the devastating part: this relabeling makes the numbers look better for everyone.

The reclassified patients were the sickest members of Stage III — that's why they had hidden metastases. Moving them out improved Stage III's average survival.

But those same patients were the healthiest members of Stage IV — their metastases were so small they'd previously been invisible. Adding them improved Stage IV's average survival too.

Every stage's numbers went up. Zero patients were helped.

Feinstein called it stage migration. Others gave it the wittier name: the Will Rogers phenomenon.2 The joke had become an epidemiological hazard.

The cruelest aspect of stage migration is that it feels like progress. Imagine you're a health policy maker in 1984, looking at the survival tables. Stage III lung cancer: five-year survival up from 10% to 15%. Stage IV: up from 2% to 5%. You'd open the champagne. You'd cite these numbers in grant applications and press releases. You'd tell patients and their families that we're winning the war on cancer. And you'd be wrong — not because the numbers are fabricated, but because the groups they describe have silently changed under your feet.

The Formal Condition

The math is disarmingly simple. Take two groups with means μA < μB. Move an element x from B to A. Both means increase if and only if:

The Will Rogers Condition
μA < x < μB
If x lies between the two group means, moving it from B to A raises both averages.

That's it. Any element between the two means is a Will Rogers migrant. Move it from the higher-mean group to the lower-mean group, and both averages increase. The proof is two lines of algebra and an inequality. The consequences fill medical journals.

· · ·
Chapter 34

Try It Yourself

The simulator below generates two groups of patients with overlapping severity scores. Drag the threshold to reclassify borderline members between groups. Watch what happens to both averages — especially when they both go up at once.

Stage Migration Simulator

Stage III (Less Severe)
Stage IV (More Severe)

Threshold: 50
🎉 Both averages went up! Nobody got healthier.
· · ·
Chapter 34

The Family Resemblance: Simpson's Paradox

If the Will Rogers phenomenon sounds familiar, it should. It's a close cousin of Simpson's paradox — the famous statistical trap where a trend that appears in every subgroup reverses when the subgroups are combined. In Simpson's paradox, the problem runs in one direction: the subgroups tell one story, the aggregate tells another. In the Will Rogers phenomenon, the problem runs in a subtly different direction: the subgroups change their composition, so even though each subgroup's average improves, the improvement is entirely an artifact of who got sorted where.

Here's one way to see the kinship. Simpson's paradox says: don't assume that what's true of the parts is true of the whole. The Will Rogers phenomenon says: don't assume that what's true of the whole is true of any individual part-member. They're mirror images of the same underlying warning — that averages are not people, and groups are not fixed.

Consider a concrete parallel. In the famous Berkeley admissions case, the university appeared to discriminate against women in overall admissions, but when you broke the data down by department, women were admitted at equal or higher rates. The apparent discrimination was caused by women applying disproportionately to more competitive departments. The groups — departments — didn't change, but who was in each group's applicant pool did. Simpson's paradox. Now imagine Berkeley had reclassified some departments from "competitive" to "less competitive." Suddenly both categories would show improved admission rates. Same trick, different costume. That's Will Rogers.

The deeper lesson is that any time you slice data into categories, you're making a choice. And when that choice changes — when diagnostic criteria shift, when boundaries are redrawn, when definitions are updated — the slicing itself generates apparent trends that have nothing to do with reality. Ellenberg has a phrase for this family of errors: the algebra of groups is not the algebra of individuals. The Will Rogers phenomenon is one of the most vivid illustrations of why.

· · ·
Chapter 34

Beyond the Hospital

Stage migration is the best-known version, but the Will Rogers phenomenon lurks everywhere averages meet reclassification.

School Redistricting

A city redraws school boundaries. The top students from a struggling school get reassigned to a high-performing school across town. The struggling school's average test scores go up — its best students were pulling the average in their direction, but they were still below the other school's mean. The high-performing school's average also goes up — these new students, while the worst in their new school, are above that school's old average only if they fall in the Will Rogers zone. When they do — and with overlapping distributions, they often do — both schools "improve."3

The school board holds a press conference. Everyone applauds. No child learned a single additional thing.

This isn't hypothetical. When the No Child Left Behind Act created strong incentives tied to school-level averages, districts discovered that strategic boundary adjustments could move the needle without moving any students' actual knowledge. The metric became the target, and the target became a game. Goodhart's Law and Will Rogers, working hand in hand.

Corporate Restructuring

A company splits its underperforming division in two, moving the better-performing teams to the profitable division. Both divisions now have higher average revenue per team. The CEO presents this as a turnaround. The actual revenue hasn't changed by a cent.4

If you've worked at a large company, you've probably seen this happen during a "reorganization." One quarter, your team is in a struggling business unit. The next quarter, after some executive reshuffling, you're in a thriving one. The org chart changed. Your work didn't. But the PowerPoint at the next all-hands will show two divisions trending upward, and someone will get a bonus for it.

Immigration

This was Rogers's original joke, of course. If emigrants from Country A are below A's average income but above Country B's average income — which is plausible when A is richer — then migration raises the average in both countries. This has been observed in studies of selective migration patterns, though the real-world picture is obviously more complicated than two buckets of numbers.5

The Will Rogers Zone Will Rogers Zone μ_A μ_B severity →
Anyone in the gold zone — between the two group means — is a Will Rogers migrant. Move them from B to A and both averages rise.
· · ·
Chapter 34

The Phenomenon That Won't Die

You might think that after Feinstein's 1985 paper, oncologists would have learned the lesson. They did — sort of. The problem is that the lesson has to be relearned every time diagnostic technology takes a leap forward.

In the 1990s, PET scans arrived. They could detect metabolic activity in tumors that CT scans couldn't see. Once again, patients migrated upward in stage. Once again, stage-specific survival rates improved. Once again, overall survival barely moved. The same thing happened with sentinel lymph node biopsy in breast cancer, which caught microscopic spread that older surgical techniques missed.8

In prostate cancer, the story is particularly dramatic. The introduction of PSA screening in the late 1980s created an enormous wave of stage migration. Men who would previously have been diagnosed with advanced prostate cancer — or never diagnosed at all until it killed them — were suddenly caught early, reclassified as "localized." The five-year survival rate for localized prostate cancer is nearly 100%. The survival rate for advanced prostate cancer also went up, because the worst "localized" cases migrated to "advanced." Both stages looked better. The overall death rate from prostate cancer did decline somewhat, but far less than the stage-specific numbers suggested — and much of even that decline may be attributable to lead-time bias, another statistical phantom that haunts cancer screening.9

The pattern is so reliable it's almost a law: every improvement in diagnostic sensitivity will produce a Will Rogers effect. Better tests don't just find disease earlier. They reclassify patients. And reclassification, as we've seen, generates the illusion of progress. The question is never "did stage-specific survival improve?" — that question almost answers itself. The question is "did anyone actually live longer?"

· · ·
Chapter 34

The Deeper Lesson

The Will Rogers phenomenon is a special case of a much larger sin: confusing a change in measurement with a change in reality. It belongs to the same family as Simpson's paradox, Berkson's paradox, and the ecological fallacy — situations where aggregating or disaggregating data produces conclusions that vanish when you look at individuals.

The Core Insight

Averages are summaries of groups. When you change who's in the group, you change the summary — even if you don't change any individual. The Will Rogers phenomenon is what happens when people forget that groups are made of choices, not just numbers.

Feinstein's 1985 warning came with a prescription: when comparing survival rates across eras, you must account for changes in diagnostic criteria. If the definition of Stage III changed between 1975 and 1985, then comparing Stage III survival across those years is comparing apples to slightly reclassified oranges. The only honest comparison is overall survival, unstratified — which, in the lung cancer data, showed no improvement at all.6

This advice is now standard in oncology, though violations still appear. A 2017 review found stage migration affecting reported outcomes in cancers from breast to bladder.7 Every time imaging technology improves, the boundary between "localized" and "metastatic" shifts, and the averages dance their Will Rogers jig.

So What Do We Do?

Three defenses:

1. Always check the overall. If every subgroup improved but the total didn't, you've got a Will Rogers situation. The subgroup improvements are an artifact of reclassification.

2. Track individuals, not groups. Did any specific patient live longer? Did any specific student learn more? Averages can lie; individual trajectories can't (as easily).

3. Be suspicious of boundary changes. Whenever someone redefines a category — new diagnostic criteria, redistricted schools, reorganized departments — ask what happens to the borderline cases. That's where the magic trick happens.

Will Rogers was making fun of Oklahomans. But the phenomenon that bears his name makes fools of all of us — every time we celebrate an average going up without asking who moved, and where, and why.

The good news is that the defense is simple, even if it requires discipline: look at the totals. Look at individuals. Look at whether the boundaries moved. And when someone shows you that every subgroup improved, ask the one question that the Will Rogers phenomenon hopes you'll forget: did the whole get better too?

Because if it didn't, then what you're looking at isn't improvement. It's reclassification. And reclassification, no matter how sophisticated the technology behind it, no matter how many significant digits it carries, is not the same thing as making anyone's life better. The Okies moved to California. Both states got "smarter." And not a single person learned a thing.

· · ·
Chapter 34

Can You Spot the Trick?

Test your intuition. In each scenario below, decide: is this a real improvement, or just the Will Rogers phenomenon at work?

The Will Rogers Quiz

Question 1 of 5
"When the Okies left Oklahoma…" OK avg ↑ CA avg ↑ migrate
Both states win. Nobody got smarter. That's the joke — and the theorem.

Notes & References

  1. Feinstein, A. R., Sosin, D. M., & Wells, C. K. (1985). "The Will Rogers phenomenon: Stage migration and new diagnostic techniques as a source of misleading statistics for survival in cancer." New England Journal of Medicine, 312(25), 1604–1608.
  2. The name "Will Rogers phenomenon" was coined in the Feinstein et al. paper itself, referencing the apocryphal quip about Oklahoma and California. Whether Rogers actually said it is debated, but the attribution stuck.
  3. For a general treatment of how redistricting affects school performance metrics, see Cullen, J. B., Jacob, B. A., & Levitt, S. D. (2006). "The effect of school choice on participants." Journal of Political Economy, 114(4), 604–648.
  4. This is a simplified example, but the logic is identical to stage migration. See Bravata, D. M. et al. (2007). "The Will Rogers phenomenon in surgery." Annals of Surgery, 245(5), 840–841, for a discussion of analogous effects in surgical outcomes reporting.
  5. Borjas, G. J. (1987). "Self-selection and the earnings of immigrants." American Economic Review, 77(4), 531–553. Borjas's model of immigrant self-selection is consistent with Will Rogers dynamics when emigrants are intermediate in skill.
  6. Feinstein et al. (1985), ibid. The overall 5-year survival for lung cancer in the SEER database remained essentially flat across the period they studied, even as stage-specific rates improved.
  7. Chee, K. G. et al. (2008). "Stage migration and the Will Rogers phenomenon." In Principles and Practice of Clinical Research, 3rd ed. See also Albertsen, P. C. et al. (2005). "Impact of stage migration on survival in prostate cancer." Urology, 66(5), 1041–1046.
  8. Woodward, W. A. et al. (2003). "Changes in the 2003 American Joint Committee on Cancer staging for breast cancer dramatically affect stage-specific survival." Journal of Clinical Oncology, 21(17), 3244–3248.
  9. Welch, H. G., & Albertsen, P. C. (2009). "Prostate cancer diagnosis and treatment after the introduction of prostate-specific antigen screening: 1986–2005." Journal of the National Cancer Institute, 101(19), 1325–1329.