All Chapters

The Missing Chapter

The Tyranny of the Bell Curve

It's the most famous shape in mathematics — and we trust it far more than we should.

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 1

The Shape That Named Itself

In 1733, a French mathematician named Abraham de Moivre was trying to solve a gambler's problem. If you flip a fair coin 100 times, how likely is it that heads comes up exactly 60 times? The exact calculation required multiplying enormous factorials — numbers so large that even the most patient human would run out of parchment. So de Moivre did what mathematicians do when exact answers are intractable: he found an approximation. And that approximation traced a curve so graceful, so symmetric, so eerily ubiquitous that it would come to dominate two centuries of science.1

He called it the "curve of errors." We call it the bell curve, the Gaussian distribution, the normal distribution — that last name being perhaps the most dangerous, because it implies that this particular shape is, well, normal. The default. The way things ought to be. And that quiet assumption has led to some spectacular mistakes.

The normal distribution is not a fact about the world. It's a consequence of adding up many small, independent things. When that description fits, the bell curve is miraculous. When it doesn't, the bell curve is a trap.

Here's what's genuinely magical about it, though. Measure the heights of a thousand randomly chosen American women. Plot them on a histogram. You'll get a bell curve centered around 5 feet 4 inches, with a standard deviation of about 2.8 inches. Measure the weights of a thousand ball bearings produced by a careful factory. Bell curve. Add up the time it takes a thousand different customers to check out at a grocery store. Bell curve. The scores on a well-designed standardized test. Bell curve. It keeps showing up, over and over, in contexts that seem to have nothing to do with each other.2

This isn't a coincidence. It's a theorem.

μ μ − σ μ + σ 68.2% The Bell Curve: Symmetry, Mean, and Standard Deviation
The classic bell shape. About 68% of data falls within one standard deviation of the mean — a fact so reliable that entire industries are built on it.
Chapter 2

The Central Limit Theorem, or Why God Seems to Love Averages

The reason the bell curve keeps appearing is one of the most profound results in all of mathematics: the Central Limit Theorem. It says, roughly, that if you take any random process — any random process, with any underlying distribution — and you add up a large number of independent copies of it, the sum will look approximately normal. Not exactly normal. Approximately. But the approximation gets better and better as the number of copies grows.3

Think about what this means. Roll a single die: you get a uniform distribution, flat as a pancake. Roll two dice and add them: you get a triangle, peaked at 7. Roll five dice and add them: the peak sharpens. Roll thirty dice? You can barely tell it apart from a perfect bell curve. The shape of the original distribution — uniform, skewed, bimodal, whatever — gets washed away by the act of adding things up.

This is why human height follows a bell curve. Your height isn't determined by a single gene making a single decision. It's the cumulative result of thousands of genetic and environmental factors, each contributing a little push up or down: this gene adds a millimeter, that nutritional factor subtracts half a millimeter, this hormone added a centimeter during puberty. You are, quite literally, a sum of small effects. And sums of small effects are normal.4

Here's an intuition pump. Imagine a drunk person standing at a lamppost, taking random steps left or right. After one step, they could be anywhere — one step left or one step right. After two steps, the position 0 (back at the lamppost) is twice as likely as position −2 or +2. After a hundred steps, their position follows an almost perfect bell curve centered at zero, with a standard deviation of 10 (the square root of 100).

The Central Limit Theorem is telling you that randomness, when you pile enough of it together, has a shape. And the shape is always the same shape.

The theorem was first glimpsed by de Moivre in 1733, stated more precisely by Laplace in 1812, and finally proven rigorously by Lyapunov in 1901. But its power was understood long before it was proven. Adolphe Quetelet, a Belgian astronomer-turned-sociologist, measured the chest circumferences of 5,738 Scottish soldiers in the 1840s and was delighted to find them normally distributed. He concluded that there must exist an l'homme moyen — an "average man" — and that every real person was a deviation from this Platonic ideal. The bell curve wasn't just a statistical convenience. It was a window into the mind of God.5

That was a beautiful thought. It was also the beginning of a lot of trouble.

Chapter 3

The Formula That Runs the World

The normal distribution is defined by exactly two numbers: the mean (μ) and the standard deviation (σ). Once you know those two numbers, you know everything. The formula itself looks intimidating:

The Normal Probability Density Function
f(x) = 1 / (σ√(2π)) · e−(x−μ)² / 2σ²
μ = mean (center of the bell), σ = standard deviation (width of the bell), e ≈ 2.718, π ≈ 3.14159

But the intimidation is misleading, because the formula is really saying something simple: the probability of seeing a value decreases exponentially as you move away from the mean. Not linearly — exponentially. One standard deviation away, you're at about 61% of the peak height. Two standard deviations away, you're at 13.5%. Three standard deviations away, you're at 1.1%. The tails of the bell curve are thin. Astonishingly thin. A value six standard deviations from the mean should happen about once in every 500 million observations.

This is the property that makes the normal distribution so convenient for engineering and finance: extreme events are so rare that you can essentially ignore them. A bridge engineer who designs for loads up to four standard deviations above the expected maximum can sleep soundly at night. A quality control manager who flags any widget more than three standard deviations from spec will catch 99.7% of defects.

"The normal distribution is the mathematical equivalent of a comfortable lie. Most of the time it's close enough. But 'close enough' and 'always right' are not the same thing."

And here is where the trouble starts. Because finance — specifically the modern theory of financial risk — was built almost entirely on the assumption that returns are normally distributed. And they're not.

Chapter 4

When the Bell Curve Breaks

On October 19, 1987 — Black Monday — the Dow Jones Industrial Average fell 22.6% in a single day. If stock returns were normally distributed with the historical mean and standard deviation, that event was roughly a 25-standard-deviation event. The probability of that happening on any given day is approximately 10−135. For perspective, there have only been about 1017 seconds since the Big Bang. If you had been trading since the beginning of the universe, on every planet in every galaxy, Black Monday should not have happened. Not once. Not ever.6

But it did happen. And similar "impossible" events keep happening. The 1998 collapse of Long-Term Capital Management — a hedge fund staffed by Nobel laureates who explicitly relied on normal distribution models — required a $3.6 billion bailout to prevent systemic financial collapse. The 2008 financial crisis involved multiple events that were supposed to be five- or six-sigma outliers. Nassim Taleb has spent a career pointing out that the bell curve's thin tails are a fantasy when applied to markets, pandemics, bestseller lists, and wars.7

The problem is that financial returns don't come from adding up many independent, identically distributed little shocks. They come from a world where panic is contagious, where one person's selling triggers another person's margin call, where feedback loops amplify small disturbances into catastrophes. The Central Limit Theorem's conditions — independence, finite variance — are violated. And when the conditions are violated, the conclusion doesn't follow.

In a normally distributed world, you'd expect a daily move of more than 5 standard deviations about once every 14,000 years. In the actual S&P 500, it happens about once a decade.

The mathematician Benoît Mandelbrot, who studied cotton prices in the 1960s, was among the first to notice this. He proposed that market returns follow stable distributions with heavier tails — distributions where extreme events are rare but not astronomically rare. The financial industry listened politely, then went right on using the normal distribution because the math was so much easier.8

This is, perhaps, the deepest lesson of the normal distribution: not that it's wrong, but that it's seductive. It makes hard problems tractable. It reduces everything to two numbers. It lets you say things like "there's a 95% chance the answer lies in this interval" and sound precise. The danger isn't in using it — it's in forgetting that you're using it, in mistaking the model for the world.

Thin Tails vs. Fat Tails more mass more mass Normal (thin tails) Fat-tailed distribution
The red curve has "fatter" tails — extreme events are far more likely than the normal distribution predicts. Financial markets, earthquakes, and city sizes all exhibit this pattern.
Chapter 5

The 68-95-99.7 Rule (And Why It Matters)

If there's one thing to take away from the normal distribution, it's the empirical rule: roughly 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three. This is the skeleton key that unlocks an enormous number of practical problems.

Suppose your doctor tells you that your blood pressure is "two standard deviations above the mean." Without knowing anything about the specific numbers, you know you're in roughly the top 2.5% of the population — because 95% of people fall within two standard deviations, and by symmetry, 2.5% are above two SDs and 2.5% are below. That single piece of information tells you this is worth worrying about.

Or suppose you run a factory that produces bolts with a mean diameter of 10.0 mm and a standard deviation of 0.1 mm. If a bolt needs to fit through a hole that's 10.3 mm wide, you can instantly calculate that only about 0.15% of your bolts (those more than three SDs above the mean) will fail to fit. That's the kind of practical, instant answer that makes the normal distribution so beloved in engineering.

The Standard Deviation Is Not Optional

An average without a standard deviation is like a map without a scale. It tells you where the center is, but nothing about how spread out the data is. "The average American household has 2.5 children" is a useless fact without knowing that the standard deviation is about 1.2 — which tells you that a household with 5 children is unusual but not rare (about 2 SDs away), while a household with 10 children is extraordinarily rare.

Chapter 6

Galton's Board and the Shape of Chance

In the 1870s, Francis Galton — Charles Darwin's half-cousin and one of history's most complicated scientific figures — built a physical device to demonstrate the Central Limit Theorem. He called it a "quincunx," though today we usually call it a Galton board. You drop a ball in at the top, and it bounces off a series of pegs, going left or right at each peg with equal probability. After passing through many rows, the ball lands in one of the bins at the bottom.9

Drop one ball and it could land anywhere. Drop a hundred balls, and something beautiful happens: they pile up into a bell curve. Each ball's final position is the sum of many independent left-right bounces — exactly the setup the Central Limit Theorem describes. Galton was so entranced by this that he wrote, in a passage that's either poetic or slightly unhinged:

"I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the 'Law of Frequency of Error.' The law would have been personified by the Greeks and deified, if they had known of it."

He was, to put it mildly, a fan. He was also a eugenicist who used the normal distribution to argue for selective breeding of humans, which is a reminder that even beautiful mathematics can be put to horrifying purposes when you forget that your model is not the world.10

Below, you can play with your own Galton board. Watch the balls accumulate and see the bell curve emerge in real time.

Galton Board Simulator
Drop balls through pegs and watch the normal distribution emerge. Each ball bounces left or right at every peg with equal probability.
Rows of pegs
12
Balls dropped
0
Chapter 7

The Moral of the Bell Curve

So what's the right way to think about the normal distribution? I'd suggest something like this:

The bell curve is a consequence, not a cause. It appears when many small, roughly independent factors add together — which is a surprisingly common situation, but not a universal one. When those conditions hold, you get thin tails and the 68-95-99.7 rule, and life is mathematically pleasant. When those conditions don't hold — when there are multiplicative effects, feedback loops, contagion, power laws, or heavy-tailed individual components — you get something else entirely, and the bell curve's comfortable predictions can be catastrophically wrong.

The test is not "does this data look like a bell curve?" (Almost anything looks bell-ish if you squint.) The test is "does this data come from a process that is a sum of many small independent things?" If the answer is yes — human heights, measurement errors, standardized test scores — use the normal distribution with confidence. If the answer is no — stock returns, city sizes, earthquake magnitudes, book sales — use it at your peril.11

Ask yourself: in this domain, can a single observation be thousands of times larger than the average? If you're measuring human heights, no — the tallest person in history was less than twice the average. If you're measuring net worths, yes — Jeff Bezos's net worth is about a million times the median. The first world is Gaussian. The second is not. Confusing the two is a recipe for being very, very wrong.

Adolphe Quetelet's "average man" was a lovely idea. But the world is not as average as he hoped. Some things cluster tightly around a mean and fall off with the reliable symmetry of a bell. Other things sprawl across orders of magnitude, with rare-but-devastating extremes that no bell curve can capture. The mathematics of the normal distribution is gorgeous and true. The mistake is thinking it applies everywhere — that because it's called "normal," it must be the norm.

It isn't. It's one shape among many. The trick is knowing when you're in bell curve territory, and when you've wandered into the wild.

• • •
Normal Distribution Explorer
Adjust the mean and standard deviation to see how the bell curve changes. The shaded regions show the 68-95-99.7 rule in action.
Mean (μ)
0
Standard Deviation (σ)
1.0
Within 1σ
68.2%
Within 2σ
95.4%
Within 3σ
99.7%
Where the Bell Curve Works — and Where It Doesn't ✓ Bell Curve Territory Human heights Blood pressure readings Manufacturing tolerances IQ scores Measurement errors Shoe sizes ✗ Not Bell Curve Territory Wealth / income Stock market returns City populations Book / music sales Earthquake magnitudes War casualties Additive, independent, bounded sums of small things multiplicative / contagious
The key question: is the data generated by adding many small independent things? If yes, ring the bell. If no, watch your tails.

Notes & References

  1. Abraham de Moivre, The Doctrine of Chances (1733). De Moivre's approximation to the binomial distribution was the first appearance of the normal curve. See Stigler, The History of Statistics (Harvard, 1986), Chapter 2.
  2. The appearance of the bell curve in so many contexts is not mysterious once you understand the Central Limit Theorem, but it was deeply mysterious for over a century. See Hald, A History of Mathematical Statistics from 1750 to 1930 (Wiley, 1998).
  3. The Central Limit Theorem requires that the random variables have finite variance and be independent (or at least weakly dependent). The classic version is due to Lindeberg (1922) and Lévy (1925). See Billingsley, Probability and Measure (Wiley, 3rd ed., 1995).
  4. Height is influenced by hundreds of genetic variants and numerous environmental factors. See Visscher et al., "From Galton to GWAS: Quantitative Genetics of Human Height," Genetics Research 92 (2010): 371–379.
  5. Adolphe Quetelet, Sur l'homme et le développement de ses facultés (1835). Translated as A Treatise on Man and the Development of His Faculties. The chest circumference data came from the Edinburgh Medical Journal.
  6. The 22.6% single-day decline of the DJIA on October 19, 1987 far exceeded any prediction from a Gaussian model. See Mandelbrot and Hudson, The (Mis)Behavior of Markets (Basic Books, 2004), Chapter 1.
  7. Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable (Random House, 2007). Also see Taleb, Fooled by Randomness (Random House, 2001).
  8. Benoît Mandelbrot, "The Variation of Certain Speculative Prices," The Journal of Business 36, no. 4 (1963): 394–419. This landmark paper proposed stable Paretian distributions for financial returns.
  9. Francis Galton, Natural Inheritance (Macmillan, 1889). The quincunx is described in Chapter 5.
  10. Galton coined the term "eugenics" in 1883. For a critical history, see Kevles, In the Name of Eugenics (Harvard, 1995).
  11. For an excellent treatment of when to use Gaussian vs. heavy-tailed models, see Taleb, Statistical Consequences of Fat Tails (STEM Academic Press, 2020).