The Law of Large Numbers — The Missing Chapters

Chapter 23

The Professor Who Beat Vegas

In 1962, a quiet mathematics professor named Edward Thorp walked into a Las Vegas casino with $10,000 of someone else's money and a head full of formulas. He'd spent months working out, with mathematical precision, that blackjack could be beaten. Not through luck, not through intuition, but through counting—keeping a running tally of which cards had been dealt and adjusting his bets accordingly.1

Thorp won. Over the course of a weekend, he turned a profit that proved his theory wasn't just academic scribbling. The casinos noticed. They changed rules, added decks, shuffled more frequently, and eventually barred him from the tables.

But here's the thing most people miss about this story: the casinos were never really scared of Thorp.

Oh, they were annoyed, certainly. Nobody likes a professor showing up to prove you wrong. But scared? Not particularly. Because while Edward Thorp was grinding out a 1-2% edge at one table, hundreds of other players at hundreds of other tables were happily donating money at the house's customary rate. The casino didn't need to beat Thorp. It needed to play enough hands against everyone else.

This is the law of large numbers in its purest, most lucrative form. The casino's edge in blackjack is roughly 0.5% to 2%, depending on the rules. That's tiny. On any given hand, anything can happen. But across ten thousand hands? A hundred thousand? A million? That little edge compounds into an iron certainty that would make a physicist envious. The casino becomes a machine that converts randomness into reliable profit.

And this isn't just a story about gambling. It's a story about everything.

• • •

What the Law Actually Says

The law of large numbers is one of those mathematical results that sounds obvious until you try to state it precisely, at which point it becomes subtle and strange. Here's the casual version:

The more times you repeat a random experiment, the closer the average result gets to the expected value.

Flip a fair coin ten times and you might get 7 heads—a 70% rate that would alarm any statistician. Flip it ten thousand times and you'll be astonishingly close to 50%. Flip it ten million times and you could set your watch by it.

More formally: suppose you have random variables X₁, X₂, X₃, … that are independent and identically distributed, each with expected value μ. The sample average after N trials is:

Sample Average

X̄_N = (X₁ + X₂ + ⋯ + X_N) / N

The LLN says: as N → ∞, X̄_N → μ

There are actually two versions. The weak law (proved by Jacob Bernoulli around 1713, after twenty years of work2) says the sample average converges in probability: for any tolerance ε you pick, the chance that X̄_N is more than ε away from μ shrinks to zero. The strong law (proved by Kolmogorov in 1930) goes further: the sample average converges almost surely, meaning the set of outcomes where it fails to converge has probability literally zero.3

The distinction matters to measure theorists. For the rest of us, the takeaway is the same: averages stabilize. The universe is noisy on any single measurement but astonishingly predictable in aggregate.

The Rate of Convergence: Why √N Is Everything

But how fast does this convergence happen? This is the question that separates people who vaguely know the LLN from people who can actually use it. The answer involves a square root, and it changes everything.

The standard deviation of the sample average is:

Standard Error

σ_X̄ = σ / √N

σ = standard deviation of individual observations

σ: Standard deviation of a single observation
N: Number of observations
σ/√N: Standard deviation of the average — shrinks, but slowly

That √N in the denominator is both a blessing and a curse. It means precision improves with more data—blessing! But it improves agonizingly slowly—curse. To cut your uncertainty in half, you don't need twice as many observations. You need four times as many. To cut it by a factor of ten, you need a hundred times as many.

The "funnel of convergence": sample averages (blue dots) wobble wildly at first, then get trapped near the expected value. The funnel narrows as 1/√N.

This is why political polls survey about 1,000 people, not 10,000. Going from 1,000 to 10,000 costs ten times as much but only cuts the margin of error by a factor of √10 ≈ 3.16. Diminishing returns are baked into the mathematics.

• • •

Diluted, Not Corrected

Now we arrive at the most dangerous misunderstanding about the law of large numbers—one that has emptied more wallets than any other mathematical error.

You're at a roulette wheel. Red has come up six times in a row. You think: "Black is due. The law of averages demands it!" So you put your chips on black.

This reasoning is wrong. Completely, catastrophically wrong. The roulette wheel has no memory. The probability of black on the next spin is exactly what it always was: 18/38 ≈ 47.4%. The six reds did not create a debt that the universe needs to repay.

The gambler's fallacy confuses two very different ideas:

What people think the LLN says: "After a streak of unusual results, future results will compensate by going the other way."

What the LLN actually says: "After a streak of unusual results, future results will be typical, and the unusual streak will become a smaller and smaller fraction of the total."

Suppose you flip a fair coin and get 8 heads in a row. You've seen 8 heads and 0 tails—100% heads. The LLN does not predict a compensating run of tails. It predicts that the next thousand flips will be roughly 50-50. After those thousand flips, your running total might be 508 heads and 500 tails. The percentage of heads? About 50.8%. The 8-head streak hasn't been corrected; it's been diluted into irrelevance.4

Bad luck doesn't get balanced. It gets drowned out.

The universe doesn't correct your bad luck — it buries it under new data.

• • •

Watch It Happen

Theory is nice, but nothing beats watching convergence in action. The simulator below flips a coin (or rolls a die) repeatedly and plots the running average. Watch how it wobbles wildly at first, then gradually settles into a narrow band around the expected value—the funnel of convergence, drawn in real time.

Convergence Visualizer

Watch the running average converge to the expected value as N grows.

Speed: 20 / tick

N = 0

Average = —

Expected = 0.5

• • •

The Small-N Trap

If large N produces stable averages, then small N produces wild ones. This isn't just a theoretical curiosity—it's a source of real-world harm.

In the 1990s, researchers studying kidney cancer rates across U.S. counties noticed something striking: the counties with the lowest cancer rates were disproportionately rural, sparsely populated, and located in the Midwest and South. Health officials began speculating about the benefits of clean country living, fresh air, and agricultural lifestyles.

Then someone looked at the counties with the highest cancer rates. They were also disproportionately rural, sparsely populated, and located in the Midwest and South.5

The Small-Sample Trap

Small populations produce extreme rates—both high and low—not because of anything causal, but because small N means high variance. A county of 100 people where 1 person gets cancer has a rate of 1%. A county of 100 where 3 people get cancer has a rate of 3%. Both are within normal statistical fluctuation, but they look dramatically different on a map.

The same pattern appears everywhere. Small schools show up disproportionately on lists of both the best and worst schools. Small hospitals have both the highest and lowest mortality rates. Small mutual funds have both the best and worst track records. It's not performance. It's arithmetic.

Insurance: Taming Chaos with N

If small N is dangerous, the solution is obvious: make N bigger. This is essentially the entire business model of insurance.

No insurance company can predict whether you will get in a car accident this year. Your individual risk is genuinely uncertain. But if they insure 500,000 drivers? The law of large numbers transforms uncertainty into near-certainty. The fraction who will crash converges tightly around the historical average, and the company can set premiums accordingly.6

This is why insurance companies care obsessively about their "book of business"—the total number and diversity of policies they hold. A small insurer covering 200 homes in one Florida town is gambling. A large insurer covering 2 million homes across 30 states is operating a math machine. The expected losses are the same per policy, but the variance around those expectations shrinks as 1/√N.

• • •

How Many People Do You Need to Ask?

Here's a question that baffles most people: How can a poll of just 1,000 Americans tell you anything useful about 330 million? It seems absurd—that's 0.0003% of the population.

But the math doesn't care about the population size (as long as it's much larger than the sample). It cares about the sample size. The margin of error for a proportion is approximately:

Margin of Error

MOE ≈ z × √(p(1−p)/N)

z ≈ 1.96 for 95% confidence; p = proportion being estimated

Notice what's not in that formula: the population size. Whether you're surveying a city of 100,000 or a nation of 330 million, a sample of 1,000 gives you a margin of error around ±3.1%. A sample of 10,000 gets you to ±1%. The square root is doing all the work.7

Try it yourself:

Sample Size Calculator

How many people do you need to survey? Adjust the desired precision and confidence level.

Desired Margin of Error

±3.0%

Confidence Level

95%

Expected Proportion (p)

50%

Required Sample Size

1,068

That's just 0.0003% of the U.S. population

For 2× precision

4,271

For 10× precision

106,800

Notice the brutal arithmetic: to double your precision (halve the margin of error), you need four times the sample. That √N is relentless.

• • •

A/B Testing and the Patience Tax

Silicon Valley learned the law of large numbers the expensive way. In the early days of A/B testing—running two versions of a webpage to see which performs better—companies would declare a winner after a few hundred visitors. "Version B has a 3.2% conversion rate versus 2.8% for version A! Ship it!"

The problem: with small samples, random noise easily creates differences that large. You need enough data for the signal to emerge from the noise, and "enough" is almost always more than your product manager wants to wait for.

A typical A/B test for a change that improves conversion from 2.0% to 2.2%—a 10% relative lift—requires roughly 80,000 visitors per variant at 80% statistical power. That's not a suggestion; that's the square root talking.8

Many of Silicon Valley's most celebrated A/B test "discoveries" in the early 2010s were almost certainly noise. The law of large numbers was trying to help, but nobody gave it enough data to work with.

Smaller effects demand exponentially more data. Halve the effect size, quadruple the sample.

• • •

The Casino's Real Edge

Let's return to where we started. Edward Thorp beat the casino, but the casino survived Thorp and prospered. Why? Because Thorp was one man at one table, and the casino was an entire ecosystem operating at scale.

The genius of the casino business model isn't the house edge. It's the volume. A slot machine might have a house edge of 5%. On a single quarter, that's meaningless. But a busy slot machine sees perhaps 600 plays per hour, 16 hours a day, 365 days a year. That's 3.5 million plays per year. The law of large numbers compresses that 5% edge into a number the casino can take to the bank—literally.

The same principle applies to the insurance company, the polling firm, the tech company running A/B tests, and the public health official interpreting mortality data. In every case, the law of large numbers is the same promise: give me enough observations, and I will turn chaos into certainty.

The catch, always, is "enough." And thanks to that stubborn square root, enough is always more than you'd like.

The Takeaway

The law of large numbers doesn't eliminate randomness. It tames it—slowly, reliably, on its own schedule. The universe has no obligation to balance out your bad luck in the short run. But give it enough tries, and the average will converge to the truth. The casino knows this. The insurance company knows this. Now you do too.

Notes & References

Edward O. Thorp, Beat the Dealer: A Winning Strategy for the Game of Twenty-One (Random House, 1962). Thorp's card-counting system was the first mathematically proven method for beating blackjack.
Jacob Bernoulli, Ars Conjectandi (1713), published posthumously. Bernoulli worked on the proof for over twenty years and considered it his most important contribution to mathematics.
Andrey Kolmogorov, "Sur la loi forte des grands nombres," Comptes Rendus de l'Académie des Sciences 191 (1930): 910–912. The strong law requires only finite expected value; finite variance is sufficient but not necessary.
This distinction—dilution vs. correction—is discussed beautifully in Daniel Kahneman, Thinking, Fast and Slow (Farrar, Straus and Giroux, 2011), Chapter 10.
Howard Wainer, "The Most Dangerous Equation," American Scientist 95, no. 3 (2007): 249–256. Wainer traces multiple policy errors caused by misunderstanding variance in small samples, citing de Moivre's equation σ/√N.
The mathematical foundations of insurance are discussed in Sheldon Ross, A First Course in Probability, 10th ed. (Pearson, 2019), Chapter 8.
Technically, there's a finite population correction factor √((N_pop − n)/(N_pop − 1)) that matters when sampling more than ~5% of the population. For national polls, it's negligible.
Ron Kohavi, Diane Tang, and Ya Xu, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing (Cambridge University Press, 2020). Chapter 3 covers sample size calculations and common pitfalls.