← All Chapters

Chapter 11 · Hidden Structure

Regression to the Mean

Why the best get worse, the worst get better, and nobody needed to do anything at all.

Part III — Hidden Structure

Chapter 01The Curse of the Cover

On January 31, 1955, a young Milwaukee Braves infielder named Eddie Mathews appeared on the cover of Sports Illustrated. The magazine was only five months old. Mathews had just come off a monstrous 1954 season — 40 home runs, a .306 batting average, an All-Star selection. He was 23 years old, and the cover photo seemed to anoint him as the future of baseball.

In 1955, his average dropped to .289. His home runs fell to 32. He was still good — but he wasn't that good. The whispers started almost immediately.

It happened again. And again. Pete Rose appeared on the cover in 1970 after hitting .348; the next year, .304. The 1987 Cleveland Indians made the cover as preseason favorites; they finished last. Skier Jill Kinmont, quarterback Lamar Jackson, golfer Karrie Webb — the list of athletes who "slumped" after appearing on the cover grew so long that by the 1960s it had a name: the Sports Illustrated cover jinx.1

Sports fans are a superstitious bunch, and the jinx became self-evidently real. The cover was bad luck. Some athletes reportedly refused the honor. Teams hid the magazine from their clubhouses. The explanation was obvious — the pressure of being on the cover, the distraction of fame, the weight of expectations.

Except none of that was the explanation. The explanation was much simpler, much more universal, and much more unsettling. It didn't require any curse at all.

§

Chapter 02Galton's Sweet Peas

In 1886, Francis Galton — Darwin's cousin, polymath, inventor of fingerprinting, and a man with an almost pathological need to measure things — conducted an experiment with sweet pea seeds.2 He sorted seeds by size: small, medium, large. He planted each group separately. Then he waited.

When the next generation of peas grew, he measured them too. And he found something he didn't expect.

The large seeds did produce larger-than-average offspring — but not as large as themselves. The small seeds produced smaller-than-average offspring — but not as small as themselves. The children of extremes were, on average, less extreme. They had regressed toward the center.

The children of extremes are, on average, less extreme than their parents.

Galton turned to human height next. He measured 205 pairs of parents and their adult children. The pattern was identical. Tall parents had tall children, yes — but those children were, on average, a bit shorter than their parents. Short parents had short children — but those children were a bit taller. Everything was drifting back toward the average, generation after generation.3

This confused him. If tall parents always produce slightly shorter children, wouldn't the whole species converge to a single height over time? Wouldn't variation itself disappear?

No. And the reason it doesn't is the key to understanding everything that follows. Short parents also occasionally produce tall children. The regression works in both directions, from both tails, and the overall distribution stays stable. Galton called this phenomenon "regression toward mediocrity." We now call it regression to the mean.

The Key Insight

Regression to the mean doesn't require any causal mechanism. It's not a force pulling things back to average. It's a statistical inevitability whenever there's any randomness in the system at all.

§

Chapter 03The Israeli Air Force and the Illusion of Feedback

In the early 1960s, Daniel Kahneman was a young psychologist lecturing to Israeli Air Force flight instructors. His topic that day was the psychology of training — specifically, the well-established finding that positive reinforcement (praising good performance) is more effective at improving behavior than punishment (criticizing bad performance).4

One of the senior instructors raised his hand and disagreed.

"I've often praised cadets for clean execution of a complex maneuver," the instructor said. "And the next time they try it, they almost always do worse. On the other hand, when I scream at a cadet for a bad landing, the next landing is usually better. So don't tell me that reward works and punishment doesn't. My experience says the exact opposite."

Kahneman later described this as one of the most illuminating moments of his career. Because the instructor was completely right about the facts — and completely wrong about the explanation.

Think about it. When does a cadet get praised? After an exceptionally good landing. An extreme performance, well above their average ability. And what happens after an extreme performance? It regresses to the mean. The next landing will probably be closer to average — which is to say, worse. The praise didn't cause the decline. The decline was going to happen anyway.

When does a cadet get screamed at? After an exceptionally bad landing. An extreme performance, well below their average ability. And what happens next? Regression to the mean again. The next landing will probably be closer to average — which is to say, better. The punishment didn't cause the improvement.

The instructor had, through decades of experience, trained himself to believe that punishment works and praise backfires — when in reality, neither one was doing anything. He would have observed the exact same pattern if he'd flipped a coin to decide whether to praise or scold.5

The instructor had trained himself to believe punishment works — because regression to the mean made it look that way.

This is the deep trap. Regression to the mean doesn't just fool sports fans. It fools teachers, doctors, coaches, parents, managers, and policy makers. It fools anyone who observes extreme behavior and then watches what happens next.

§

Chapter 04The Math: Talent, Luck, and Why Extremes Are Extreme

Here's the simplest way to think about it. Imagine that every performance — every batting season, every flight landing, every exam score — is the sum of two components:

Performance Decomposition

Performance = Skill + Luck

Skill is stable. Luck is random noise with mean zero.

Skill
The person's true, stable ability — consistent from trial to trial
Luck
Random variation — positive or negative, averaging to zero over many trials

Skill stays the same from season to season (roughly). Luck doesn't. Luck is freshly drawn each time, from a distribution centered at zero.

Now: who ends up at the top of the rankings after Season 1? People with high skill, obviously. But also people who happened to get lucky. To be the very best in any given season, you almost certainly needed both — genuine talent and a favorable roll of the dice.

In Season 2, those same people still have their skill. But the luck resets. Some of them get lucky again, sure. But on average, their luck will be closer to zero. Which means their Season 2 performance will be closer to their true skill — which is lower than their Season 1 performance, because Season 1 had that lucky boost baked in.

The Mathematical Core

Extreme observed performances are extreme partly because of luck. Since luck doesn't persist, the next observation will be less extreme. That's regression to the mean — not a force, just an artifact of randomness.

The same logic works in reverse. The worst performers in Season 1 were unlucky. In Season 2, they'll probably be less unlucky — so they'll improve. Not because of any intervention. Just because extreme bad luck, like extreme good luck, doesn't repeat reliably.

And the crucial variable is the ratio of luck to skill. In a domain where luck dominates — like a single coin flip, or one day's stock returns — regression to the mean is massive. The best performer will look nothing special next time. In a domain where skill dominates — like chess ratings over hundreds of games — regression is minimal. The best player will still be the best.6

§

Chapter 05The Curse of the Sequel

Once you see regression to the mean, you see it everywhere — and you see all the places where people have invented causal stories to explain a statistical artifact.

The Sports Illustrated jinx. Athletes appear on the cover because they just had an extreme performance. Their next season regresses. The cover didn't cause it.

The sophomore slump. A rookie has a breakout first season and wins Rookie of the Year. The second season is disappointing. We say the league "figured them out." Maybe. Or maybe their first season included some lucky bounces that didn't repeat.

The Madden curse. The cover athlete of Madden NFL allegedly gets injured or declines. Same logic — they made the cover because they peaked.7

The curse of the sequel. A movie is a smash hit. The studio rushes a sequel into production. The sequel underperforms. Everyone blames the director, the script, the studio's greed. But the original was probably an extreme positive outlier — a lucky combination of timing, marketing, cultural moment, and quality. The sequel merely regressed.

The "flight to mediocrity" in business. Jim Collins' Good to Great identified companies with extreme performance over a certain period and reverse-engineered what made them special. Many of those companies subsequently underperformed. The cynical explanation: they were never as special as they looked. Their extreme performance contained luck, and the luck ran out.8

Imagine 1,000 CEOs each flip a coin ten times. The one who flips the most heads gets a profile in Harvard Business Review about their "winning mindset." The next year, they flip ten more coins. Are they going to get as many heads? Almost certainly not. Their technique didn't fail. Their leadership philosophy didn't expire. Their first-year performance just wasn't entirely about them.

§

Chapter 06Try It Yourself

This is one of those ideas that's easier to see than to explain. Below is a simulator that generates 100 random athletes. Each has a fixed "talent" level (drawn once) and a "luck" component (drawn fresh each season). You can adjust the slider to control how much of performance is luck versus skill — and watch regression to the mean appear before your eyes.

Regression Simulator

Generate 100 athletes with talent + luck. See how the Season 1 top performers do in Season 2.

Luck Share of Performance 50%
Pure Skill Pure Luck
Number of Athletes 100
Top 10 from Season 1
Everyone else
Regression line

Play with the luck slider. When luck is high (say 80–90%), the top performers from Season 1 crater in Season 2 — often dropping all the way back to average. When luck is low (10–20%), the top performers barely budge. That's the whole story of regression to the mean in one slider.

§

Chapter 07The Dangerous Implications

If regression to the mean were merely an amusing statistical curiosity, we could note it and move on. But it's not. It contaminates our reasoning about some of the most important questions in medicine, education, and public policy.

The Placebo's Best Friend

When do people seek medical treatment? When they feel their worst. A patient goes to the doctor at the peak of their symptoms — an extreme observation. The doctor prescribes something. The patient improves. Was it the treatment, or was the patient going to improve anyway, because extreme symptoms naturally regress?9

This is exactly why we need randomized controlled trials with placebo groups. The placebo group also regresses to the mean — they also feel terrible when they enter the trial, and they also get better on average. A treatment only "works" if it outperforms this natural regression. Yet an alarming number of alternative medicine testimonials and even some published studies fail to account for this.

The Educational Intervention Trap

A school district identifies its worst-performing schools. It pours resources into them — new curricula, better teachers, after-school programs. Test scores improve. The intervention is declared a success.

But wait. Those schools were selected because they were at their worst. Some of that poor performance was bad luck — a tough cohort of students, a flu outbreak during testing week, a temporary administrative crisis. Even without the intervention, scores would have regressed toward the district average. The only way to know if the intervention actually helped is to compare against similar schools that didn't receive it.

Speed Cameras and Crime Crackdowns

A dangerous intersection has a spike in accidents. A speed camera is installed. Accidents decrease. Was it the camera? Maybe — but the camera was installed precisely because the intersection just had an extreme run of accidents. Some regression was guaranteed.10

The same logic applies to crime crackdowns, new management at struggling companies, and coaching changes in sports. We intervene at the moment of worst performance, observe improvement, and credit the intervention. The universe was going to provide some of that improvement for free.

The Practical Lesson

Whenever you intervene at a moment of extreme performance — good or bad — you must expect regression. To know if your intervention worked, you need a control group. Without one, you're just watching the tide come in and congratulating yourself for summoning the waves.

§

Chapter 08Learning to See the Invisible

Kahneman went on to win the Nobel Prize in Economics, and he later wrote that the flight instructor lecture was the moment he first understood something deep about human cognition: we are wired to see causes where there are none.11 When we observe a pattern — praise followed by decline, punishment followed by improvement — our brains immediately construct a causal story. We can't help it. It's what brains do.

Regression to the mean is invisible. It has no mechanism, no agent, no intention. It's just what happens when you combine a stable signal with random noise and then select on the observed result. It's what happens when the world is partly predictable and partly not, which is to say: always.

The Sports Illustrated editors weren't cursing anyone. Eddie Mathews wasn't doomed by a photograph. The Israeli flight instructors weren't training their cadets through tough love. And the newest wonder drug doesn't always work as well as the first trial suggests.

The mean is patient. It always gets its due.

The mean is patient. It always gets its due.

Notes

  1. The SI cover jinx has been analyzed many times. A thorough statistical debunking appears in Wolff, A., "That Old Black Magic," Sports Illustrated, January 21, 2002. The "jinx" is almost entirely explained by regression to the mean.
  2. Galton, F., "Regression Towards Mediocrity in Hereditary Stature," Journal of the Anthropological Institute, 1886. Yes, he really called it "mediocrity." The man had a way with words.
  3. Galton's original data on 205 parent-child pairs is reproduced in Stigler, S., The History of Statistics, Harvard University Press, 1986. Stigler notes that Galton initially found the result deeply puzzling.
  4. The flight instructor story is recounted in Kahneman, D., Thinking, Fast and Slow, Farrar, Straus and Giroux, 2011, Chapter 17. Kahneman calls it "one of the most satisfying eureka experiences of my career."
  5. Technically, positive reinforcement does work better than punishment in most learning contexts — the flight instructor's observation was confounded by regression, not evidence against the research on reinforcement. See Skinner, B.F., The Behavior of Organisms, 1938.
  6. Michael Mauboussin develops the skill-luck continuum beautifully in The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing, Harvard Business Review Press, 2012.
  7. The Madden jinx is the video game equivalent of the SI cover jinx, and equally well explained by regression to the mean. See Birnbaum, P., "The Madden Curse," By the Numbers, SABR, 2003.
  8. Denrell, J., "Vicarious Learning, Undersampling of Failure, and the Myths of Management," Organization Science, 2003. The regression critique of Good to Great-style studies is also made forcefully in Rosenzweig, P., The Halo Effect, Free Press, 2007.
  9. Morton, V. and Torgerson, D.J., "Effect of Regression to the Mean on Decision Making in Health Care," BMJ, 2003. They note that "regression to the mean is a ubiquitous phenomenon in clinical practice."
  10. Elvik, R., "The Importance of Confounding in Observational Before-and-After Studies of Road Safety Measures," Accident Analysis & Prevention, 2002. Speed camera effectiveness is real, but about 30–50% of the observed improvement is typically due to regression to the mean.
  11. Kahneman, D., Thinking, Fast and Slow, 2011. "I had the most satisfying Eureka experience of my career" — his description of the flight instructor moment, which led to decades of work on cognitive biases.