All Chapters

The Missing Chapter

When Average Doesn't Exist

In the world of power laws, one observation can outweigh all the others combined.

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 1

The Bell Curve Delusion

At 2:46 p.m. on March 11, 2011, the Pacific seafloor lurched. A 400-mile stretch of the Japan Trench broke in a single catastrophic rupture. The Tōhoku earthquake released energy equivalent to roughly 600 million times the Hiroshima bomb. It shifted the Earth's axis. It moved the main island of Japan eight feet to the east.1

Japan averages about 1,500 earthquakes per year. The country has arguably the most sophisticated seismic engineering on the planet. So how do you plan for an event that releases more energy than decades of "normal" earthquakes combined?

You don't. Because if you're thinking in terms of averages, you're thinking about the wrong kind of world.

The average earthquake in Japan is magnitude 3-point-something. Averaging earthquake magnitudes is like averaging the net worth of everyone at a bus stop and concluding that everyone there is a millionaire because Jeff Bezos happened to step on. The average isn't wrong, exactly. It's just meaningless.

Here's a way to feel this in your bones. Imagine you line up a thousand random Americans and measure their heights. The tallest person might be 6'9", maybe 7 feet. That person doesn't change the average height of the group by more than a fraction of an inch. Now line up a thousand random Americans and measure their net worth. Add Jeff Bezos to the line. Suddenly the "average" net worth of your group is something like $150 million — a number that describes precisely no one in the line except him. The tall person is an outlier in a Gaussian world: notable, but domesticated. Bezos is an outlier in a power-law world: he is the data.

This isn't a quirk of earthquakes or billionaires. It's a property of an entire class of phenomena — city sizes, wealth distributions, book sales, web traffic, word frequencies, casualties in wars, insurance claims, forest fire acreage — where the ordinary rules of "normal" don't apply. These are the domains where a single observation can outweigh all the rest combined, where the tail doesn't just wag the dog but is the dog.

OUTLIER
A skyline shaped by power laws — a handful of extremes towering over a long tail of the ordinary.

Gaussian World vs. Power-Law World

You were probably taught, implicitly or explicitly, that the world is bell-shaped. Heights, test scores, blood pressure — measure enough of anything, and you get that familiar, reassuring hump. The Central Limit Theorem tells us why: whenever you're adding up a bunch of small, independent, roughly similar contributions, their sum converges to a bell curve. It's one of the most beautiful results in all of mathematics, and it explains a staggering amount of the natural world.

The key word is "adding." Heights are additive: your adult stature is the cumulative result of thousands of small genetic and environmental nudges, each one pulling you slightly taller or shorter. No single gene makes you three feet taller. No single meal adds a foot. The contributions are small, independent, and additive. That's why heights cluster around a mean, and nobody is fifty feet tall.

Now try this: "The average American's net worth is $1,063,700."2

Something feels off. The median is about $192,900. The average is being yanked upward by a tiny number of spectacularly wealthy people. One person — Elon Musk — divided among all Americans, would add roughly $600 to that average. Three hundred and thirty million denominators. And he still moves the needle.

Wealth isn't additive. It's multiplicative. Your fortune next year is this year's fortune times some factor — the return on your investments, the growth rate of your business, the appreciation of your house. When returns compound, small early differences explode into chasms. Two people starting with the same amount, one earning 8% annually and the other earning 10%, will diverge by a factor of four over forty years. Multiply that by random variation, inheritance, network effects, and policy choices, and you get a distribution that doesn't look remotely like a bell curve. You get a long, long, long right tail.

Distribution Explorer

Watch how averages behave in two fundamentally different worlds.

Inject Outlier Off
N
0
Mean
Median
Max/Mean

This is the core distinction. Nassim Nicholas Taleb calls the two regimes "Mediocristan" and "Extremistan."3 In Mediocristan — the Gaussian world — the largest observation you'll ever see is bounded. No single human, however tall, can meaningfully affect the average height of a country. The extremes are tame.

In Extremistan — the power-law world — the largest observation is unbounded in practice, and it dominates the sum. A single earthquake can release more energy than thousands of years of small tremors. One book can outsell the next thousand bestsellers stacked together. One website can get more traffic than the rest of the internet's long tail combined.

The eerie thing is how often we confuse one world for the other. We naturally assume things will be "normal" — that's literally the name we gave the bell curve — and then we're blindsided when a single event dwarfs everything else. We build financial models assuming stock returns are Gaussian, then act stunned when a "25-sigma event" happens every few years. (If returns were truly Gaussian, a 25-sigma event would happen approximately once in the lifetime of 10135 universes. The fact that they happen every decade or so should be a clue.)

· · ·
Chapter 2

The Signature on a Log-Log Plot

So how do you tell whether you're in Mediocristan or Extremistan? There's a beautifully simple diagnostic, and it involves the oldest trick in the physicist's playbook: take logarithms and see if a straight line falls out.

Here's the intuition. In a bell-curve world, if you want to find someone twice as tall as the average person, good luck — they don't exist. But in a power-law world, the relationship between "how big" and "how rare" follows a strikingly regular pattern. Double the size of an earthquake, and it becomes some fixed multiple rarer. Double the population of a city, and cities that large become some fixed multiple rarer. That "fixed multiple" is the power law's exponent, usually called α (alpha), and it's the single number that characterizes the entire distribution.

A power-law distribution has the mathematical form:

Power Law Distribution

P(X > x) x−α

The probability of observing a value greater than x falls as a power of x

α
The exponent — controls how "heavy" the tail is
α < 2
Mean may be infinite; single observations dominate
α > 3
Distribution starts behaving more politely

The diagnostic trick: plot your data on a log-log scale — logarithm of frequency versus logarithm of size. If the result is a straight line, you're looking at a power law. The slope of that line is the exponent α.4

This is genuinely magical. Take data that looks like a terrifying hockey stick on a normal plot — a single spike at the left and an endless, seemingly random tail stretching to the right — and just take the log of both axes. Suddenly: a clean, orderly straight line. All that apparent chaos was structure all along, hiding behind a nonlinear scale.

(A word of caution: the fact that something looks like a straight line on a log-log plot doesn't prove it's a power law. As the statistician Cosma Shalizi likes to point out, many distributions look approximately linear on log-log paper over a limited range. Rigorous statistical tests exist, and they matter. But the visual diagnostic remains one of the most productive first moves in data analysis.)

Log-Log Plotter

See how wildly different phenomena share the same mathematical skeleton.

LINEAR SCALE

LOG-LOG SCALE

Zipf's Law

"the" "of"
Zipf's law made visible — the most common word dwarfs all others, just like the largest city dwarfs the rest.

In 2020, New York City had 8.3 million people. The second-largest US city, Los Angeles, had 3.9 million — less than half. Number 10 (San Jose) had about 1 million. This is Zipf's law: the n-th largest city tends to have a population proportional to 1/n times the largest city's population.5

George Kingsley Zipf, a Harvard linguist, noticed this pattern first in words. The most common English word — "the" — accounts for about 7% of all words in a typical text. The second most common, "of," about 3.5%. The tenth most common, roughly 0.7%. Plot rank versus frequency and you get a near-perfect power law with exponent close to 1. The same relationship appears in the population of cities, the revenue of companies, the audience size of TV shows, and the follower counts on social media.

Word frequencies. City sizes. Earthquake energies. The number of links pointing to web pages. The number of citations a scientific paper receives. The size of craters on the Moon. Over and over, across domains that seem to have nothing to do with each other, the same shape appears. It's as if nature has a limited repertoire of distributional templates, and the power law is one of its favorites.

· · ·
Chapter 3

Where Power Laws Come From

PREFERENTIAL ATTACHMENT "Rich get richer" new MULTIPLICATIVE PROCESS "Random multipliers" SELF-ORGANIZED CRITICALITY "Sandpile avalanches" ALL THREE PRODUCE THE SAME SHAPE log-log log-log log-log
Three different mechanisms, one universal signature: a straight line on a log-log plot.

Preferential attachment. The "rich get richer" mechanism. New webpages link to pages that already have many links. New followers flock to accounts that already have large audiences. New citations go disproportionately to papers already heavily cited. Barabási and Albert showed in 1999 that this simple rule — newcomers preferentially connect to well-connected nodes — is sufficient to produce a power-law distribution of connections.6 The mechanism is ancient. The sociologist Robert Merton called it the "Matthew Effect" in 1968, after the Gospel of Matthew: "For unto every one that hath shall be given, and he shall have abundance."

Multiplicative processes. If your wealth tomorrow is today's wealth multiplied by some random factor, then over time the distribution becomes power-law. This is the key insight: additive shocks give you Gaussians; multiplicative shocks give you power laws. Think about it this way — when a company grows by 10%, the absolute gain is proportional to its current size. A $10 billion company adds $1 billion; a $100 million company adds only $10 million. The same percentage growth widens the gap in absolute terms. Do this repeatedly with random percentage changes, and you get a log-normal distribution. Add a reflective lower boundary (companies can't go below zero) or intermittent restructuring, and you get something even heavier-tailed: a power law.

Self-organized criticality. Per Bak, Tang, and Wiesenfeld's sandpile model:7 slowly dropping grains of sand onto a pile produces avalanches whose sizes follow a power law. The system naturally evolves to a "critical" state where events of all scales are possible. No one grain "causes" a large avalanche — the system is perpetually on the edge, and any small perturbation might cascade. This model has been applied to forest fires, epidemics, power grid blackouts, and even biological evolution. The lesson: you don't need a big cause for a big effect. You just need a system poised at criticality.

The Pareto Principle

You've heard the "80/20 rule." But 80/20 isn't a universal constant — it's one particular power law with one particular exponent. In global wealth, the richest 1% hold roughly 46%. The real insight: in power-law systems, a small number of items always dominate the total. The specific ratio depends on α.

· · ·
Chapter 4

Living in Extremistan

For power laws with α ≤ 2, the theoretical mean is infinite. Not "very large." Infinite. You can keep sampling forever, and your running average will never settle down. Every time you think it's stabilizing, along comes another observation that yanks it upward. This means any estimate of the "average" is unstable by nature — it's not converging to some true value, because there is no true value to converge to.

This is profoundly disorienting if you were raised on Gaussian statistics. In a bell-curve world, more data always helps. Sample a thousand heights, and your estimate of the mean is excellent. Sample a million, and it's nearly exact. But sample a thousand incomes from a power-law distribution with α = 1.5, and your sample mean is wildly unreliable. Sample a million, and it's still unreliable — just unreliable with more data points. The instability isn't a bug in your measurement; it's a feature of the distribution itself.

The Central Limit Theorem — that great workhorse of statistics — fails here. It requires finite variance. Power laws with α ≤ 2 don't have it.

So what do you do when you discover you're living in a power-law world?

Stop averaging. In VC investing, one company may return more than all the others combined. The entire game is about catching the one outlier.

Rethink risk. Japan's nuclear safety guidelines had planned for up to magnitude 8.6. Tōhoku was 9.1 — roughly eleven times more energy. The disaster at Fukushima was a failure of distributional imagination.

Beware of means in policy. "Average income is rising" can be true while most incomes stagnate, if gains concentrate at the top. Reach for the median.

Recognize the pattern. Power laws appear when success breeds success, when systems operate near criticality, when multiplicative processes dominate. Default to assuming heavy-tailed until proven otherwise.

· · ·
Chapter 5

Scale Invariance, or Why Power Laws Have No "Typical" Event

There's a mathematical property of power laws that sounds abstract but has enormous practical consequences: scale invariance. A power-law distribution looks the same no matter what scale you examine it at. Zoom in on the tail, and you find a smaller copy of the same distribution. Zoom in on that tail, and there it is again.

Compare this with a Gaussian. A bell curve has a characteristic scale — the mean — and most observations cluster near it. If I tell you the average height of adult men is 5'10", you immediately have a mental picture. Heights of 5'6" and 6'2" are common; heights of 4'0" and 7'6" are extraordinary. The bell curve has a "typical" value.

A power law has no typical value. What's a "typical" city size? A "typical" earthquake? A "typical" fortune? The question barely makes sense. There are vastly more small cities than large ones, vastly more small earthquakes than big ones, vastly more people with modest savings than enormous wealth — but the distribution has no natural center, no characteristic scale that summarizes it. The ratio of the biggest to the smallest isn't bounded by some gentle multiple of sigma. It can be millions-to-one.

This is why Vilfredo Pareto's original observation was so startling. In the 1890s, this Italian economist noticed that the distribution of wealth in every country he examined followed the same basic pattern: a few people held most of the wealth, and the curve describing this concentration had a specific mathematical form.8 He found the same shape in data from England, Prussia, Saxony, Ireland, and several Italian cities. The exponent varied — some societies were more unequal than others — but the functional form was always the same. It wasn't that these societies had made the same policy choices. It was that something deeper, something structural about how wealth accumulates, was producing this shape regardless of politics or culture.

The scale invariance of power laws also explains a phenomenon that baffles people trained on Gaussian thinking: the clustering of extreme events. Earthquakes come in clusters. Financial crashes come in clusters. Blackouts cascade. In a Gaussian world, extreme events are independent — lightning doesn't strike twice. But in a power-law world, the same conditions that produce one extreme event are exactly the conditions likely to produce another. The system is poised at criticality, and one avalanche can trigger the next.

· · ·
Chapter 6

The World Is Not Normal

Adolphe Quetelet, the 19th-century Belgian who popularized "the average man," was in love with the bell curve. He saw it in chest measurements and heights and concluded that deviations from the average were nature's "errors." The ideal man existed, Quetelet believed, and real men were just noisy approximations of him.

His vision shaped two centuries of science. We build airplane seats, dose medications, and set regulatory standards around averages and standard deviations — tools built for a Gaussian world. We design flood barriers for the "hundred-year flood." We stress-test banks against historical worst cases. We price insurance by computing expected losses. All of these procedures implicitly assume that the past is a reasonable guide to the future — that extreme events have a characteristic size, and we've already seen roughly the worst that can happen.

But many of the systems that actually govern our lives — wealth, information, catastrophe, influence, success — don't live in that world. They live in Extremistan, where the next observation might dwarf everything that came before it. Where the "hundred-year flood" is not a fixed benchmark but an artifact of a too-short data record. Where the worst historical stock market crash is not a ceiling but a data point.

The practical stakes are enormous. Nassim Taleb has argued — convincingly, I think — that the 2008 financial crisis was not a failure of prediction but a failure of distributional imagination. The risk models used by banks assumed Gaussian tails: the probability of a large loss dropped off exponentially fast. In reality, financial returns have power-law tails, which means large losses are orders of magnitude more likely than Gaussian models suggest. The models said a crisis of that severity was essentially impossible. The world said otherwise.

The next time someone offers you an average, ask yourself: Am I in Mediocristan, or Extremistan? Because if it's the latter, that number isn't a summary. It's a mirage.

Ellenberg's great insight in How Not to Be Wrong is that mathematics isn't a set of rules for calculation — it's a set of tools for thinking clearly about the world. Power laws are one of those tools. They tell you when to trust an average and when to laugh at one. They tell you when the past predicts the future and when it's a comforting fairy tale. They tell you why a handful of cities, companies, songs, earthquakes, and people will always dominate the statistics, not because the game is rigged (though sometimes it is), but because the game has a specific mathematical structure that guarantees this outcome.

The world is not normal. Once you see this — really see it — you can never unsee it. And that's the point.

Notes & References

  1. The Tōhoku earthquake of March 11, 2011 was magnitude 9.1, the most powerful earthquake ever recorded in Japan. National Police Agency of Japan, "Damage Situation and Police Countermeasures," March 2021.
  2. Federal Reserve, "Survey of Consumer Finances," 2022. Mean family net worth: $1,063,700. Median: $192,900.
  3. Nassim Nicholas Taleb, The Black Swan (Random House, 2007), Chapter 3.
  4. Clauset, A., Shalizi, C.R., and Newman, M.E.J. "Power-Law Distributions in Empirical Data," SIAM Review 51(4), 2009.
  5. Zipf, G.K. Human Behavior and the Principle of Least Effort (1949). See also Gabaix, X. "Zipf's Law for Cities," QJE 114(3), 1999.
  6. Barabási, A.L. and Albert, R. "Emergence of Scaling in Random Networks," Science 286, 1999.
  7. Bak, P., Tang, C., and Wiesenfeld, K. "Self-Organized Criticality," Physical Review Letters 59(4), 1987.
  8. Pareto, V. Cours d'économie politique (1896). For a modern treatment, see Newman, M.E.J. "Power laws, Pareto distributions and Zipf's law," Contemporary Physics 46(5), 2005.