All Chapters

The Missing Chapter

The Law of Small Numbers

A Prussian counting dead cavalrymen discovered the hidden clockwork of rare events.

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 1

Death by Horse

In 1898, a Russian-born statistician named Ladislaus Bortkiewicz published a slim volume with an impossibly dull title: Das Gesetz der kleinen Zahlen — "The Law of Small Numbers."1 Its most famous dataset concerned Prussian cavalry soldiers killed by horse kicks between 1875 and 1894. Fourteen army corps, twenty years, 280 corps-years of data. The question: how many soldiers in a given corps died from horse kicks in a given year?

The answer, in most cases, was zero. Or one. Occasionally two. Rarely three. And in exactly four cases out of 280, the number was four. Nobody ever recorded five.

Bortkiewicz wasn't interested in horses. He was interested in a deeper question: when something happens rarely and randomly, is there a shape to that randomness? The answer turned out to be yes — and the shape was always the same.

That shape is the Poisson distribution, named after the French mathematician Siméon Denis Poisson, who had derived the formula sixty years earlier in a treatise on criminal justice.2 Poisson never applied it to data. He was a theorist, not a counter. It took Bortkiewicz — armed with dead cavalrymen — to show that the formula wasn't just algebra. It was a law.

Prussian Cavalry Deaths by Horse Kick (1875–1894) 0 40 80 120 160 144 91 32 11 2 0 1 2 3 4 Deaths per corps per year Observed Poisson (λ=0.70)

Bortkiewicz's data fit the Poisson prediction with eerie precision. The green line is not a fit — it's a prediction from a single number: the average rate.

Look at that chart. The red bars are the actual data — how many corps-years recorded zero deaths, one death, two deaths, and so on. The green dots are what the Poisson formula predicts, armed with nothing but the average: 0.70 deaths per corps per year. The match is almost unsettling. You give the formula one number — the rate — and it tells you everything else. How often nothing happens. How often something happens once, twice, three times. It even correctly predicts the lonely pair of corps-years that saw four deaths.

This isn't curve-fitting. This isn't finding a polynomial that wiggles through your data points. This is a theorem — a mathematical inevitability. And it shows up everywhere.

· · ·
Chapter 2

The Formula That Runs on One Number

Most probability distributions need you to specify multiple parameters. The normal distribution needs a mean and a standard deviation. The beta distribution needs two shape parameters. The Poisson distribution needs exactly one number: λ (lambda), the average rate at which events occur.

Once you know λ, you know everything:

The Poisson Formula
P(k) = e−λ · λk / k!

The probability of exactly k events, given an average rate of λ.

λ
The average number of events per time period (the only input)
k
The specific number of events you're asking about (0, 1, 2, …)
e
Euler's number ≈ 2.71828… (the base of natural logarithms)
k!
k factorial — the product 1 × 2 × 3 × … × k

There is something philosophically beautiful about this. In a world where every rare event feels unique — this car accident, that lightning strike, this particular server crash at 3 a.m. — the Poisson distribution says: I don't care about the details. Tell me the rate, and I'll tell you the pattern.

It's the ultimate "zoom out." Up close, every rare event has a story, a cause, a narrative. From far enough away, they're all the same shape.

The Poisson distribution emerges whenever you have a large number of independent opportunities, each with a small probability of success, and the average number of successes is moderate. Flip a coin that lands heads with probability 1/1000, and flip it 1000 times. The number of heads is approximately Poisson with λ = 1. This is a theorem, not an approximation in the hand-wavy sense. As the number of trials goes to infinity and the probability of each goes to zero — holding their product λ constant — the binomial distribution converges to the Poisson, term by term.3

Think of a Prussian cavalryman's year. Every day presents a tiny chance of being kicked to death. Three hundred and sixty-five days, each one a near-zero-probability lottery ticket for doom. The total number of "winning" tickets follows the Poisson distribution. And the same logic applies to typos per page (many words, each with a small chance of error), goals per soccer match (many possessions, each with a small chance of scoring), mutations per genome per generation (billions of base pairs, each with a minuscule mutation rate), and calls arriving at a switchboard per minute.

This is why the Poisson distribution was the first thing Agner Krarup Erlang reached for when the Copenhagen Telephone Company asked him to model call arrivals in 1909.4 It's why the British statistician R.D. Clarke used it to analyze V-2 rocket strikes on London during World War II.5 And it's why, right now, Amazon is using it to predict how many of a given product will be ordered tomorrow.

· · ·
Chapter 3

The Poisson Surprise Machine

Here's what makes the Poisson distribution genuinely useful, and genuinely dangerous when ignored: it tells you how much variability to expect from rare events, and the answer is always more than you think.

Suppose a city averages 2 terrorist attacks per year. In a Poisson world, the probability of zero attacks in a year is e−2 ≈ 13.5%. That means roughly one year in seven will be completely peaceful. And the probability of 5 or more? About 5.3%. So roughly every twenty years, you'll see a year with five or more attacks.

Now imagine a politician who doesn't know this. After a peaceful year, they claim their policies are working. After a year with five attacks, they declare a crisis. Both statements might be wrong. The variation is exactly what you'd expect from randomness alone. The Poisson distribution is, in this sense, a machine for distinguishing signal from noise.

The most common mistake in interpreting rare events is to confuse the Poisson noise floor with a meaningful trend.

R.D. Clarke's analysis of V-2 strikes is the canonical illustration. During the final months of World War II, 535 V-2 rockets landed on London's south side. Clarke divided the area into 576 small squares of 0.25 km² each. If the rockets were aimed randomly — if the Germans had no ability to target specific neighborhoods — then the number of hits per square should follow a Poisson distribution with λ = 535/576 ≈ 0.9323.

Clarke computed the predictions and compared them to reality:5

V-2 Rocket Strikes on South London (1944–45) Hits Observed Poisson Diff 0 229 226.7 +2.3 1 211 211.4 −0.4 2 93 98.5 −5.5 3 35 30.6 +4.4 4 7 7.1 −0.1 5+ 1 1.6 −0.6 The rockets were random. No neighborhood was targeted.

Clarke's analysis proved that V-2 rockets fell randomly — Londoners' belief that some areas were "safer" was an illusion.

The fit was astonishing. Clarke's conclusion: the rockets were falling essentially at random. The clustering that terrified Londoners — "the East End keeps getting hit!" — was not evidence of German precision bombing. It was exactly the kind of clustering you'd expect from pure chance. Randomness clumps. It always does. And the Poisson distribution tells you exactly how much clumping to expect.

The Clumping Principle

Truly random events are not evenly spaced. They cluster. The Poisson distribution quantifies this: when events occur at a rate of λ per period, the variance also equals λ, meaning the standard deviation is √λ. For rare events (small λ), the standard deviation is large relative to the mean — making dramatic clusters inevitable.

This is the part that human intuition gets catastrophically wrong. We expect randomness to look uniform. If a city averages one murder per week, we expect roughly one per week. But the Poisson distribution says: some weeks will have zero, and some will have three or four. And both of those are normal. The three-murder week isn't a crime wave. The zero-murder week isn't a breakthrough. They're both just the Poisson distribution doing what it does.

· · ·
Chapter 4

Play the Poisson

The best way to internalize this is to see it. The interactive below lets you set a rate λ and watch the Poisson distribution unfold. Adjust the slider and notice how the shape changes — always that same characteristic skew to the right, always that long tail of unlikely-but-possible outcomes.

Poisson Explorer

Set the average rate λ and see the probability of each count.

3.0
Mean 3.0
Variance 3.0
Std Dev 1.73
P(0) 5.0%

Notice something? When λ is small — say, 1 or 2 — the distribution is sharply skewed: most of the probability is piled up at zero and one, with a long tail trailing to the right. As λ grows, the distribution becomes more symmetric, approaching the bell curve. This isn't a coincidence. When λ is large, the Poisson distribution is well-approximated by a normal distribution with mean λ and variance λ.6

This convergence is one reason the Poisson is sometimes overlooked. For large λ, you can "just use the normal." But for small λ — the regime of rare events — the normal approximation is terrible. It assigns positive probability to negative counts, for one thing. The Poisson distribution lives on the non-negative integers, which is exactly where counts live. It was built for this.

· · ·
Chapter 5

Soccer, Earthquakes, and the Wrong Kind of Snow

In 1997, two statisticians at the University of Lancaster published a paper showing that goals in soccer follow the Poisson distribution with striking fidelity.7 If England averages 1.2 goals per match at home, you can predict the probability of any scoreline. The probability of a 0–0 draw between two evenly-matched teams averaging 1.1 goals each? About 0.301 × 0.301 ≈ 9%. The bookmakers, it turned out, had known this for decades.

But the Poisson distribution isn't just for counting events in time. It shows up in space, in text, in genomes, and in the decay of radioactive atoms. Here's a partial list of phenomena that follow — or approximately follow — the Poisson distribution:

In time: Calls to a switchboard per minute. Customers entering a shop per hour. Radioactive decays per second. Goals per soccer match. Earthquakes above magnitude 6 per year. Accidents at an intersection per month.

In space: Stars per unit area of sky. Bacteria colonies per petri dish quadrant. Raisins per cookie (really). Printing errors per page.

In sequence: Mutations per generation. Typos per chapter. Crossover events per chromosome during meiosis.

In networks: Emails received per hour. Website hits per second. 911 calls per shift.

The raisins-per-cookie example deserves a moment. If you're a quality control engineer at a bakery and your recipe calls for an average of 5 raisins per cookie, the Poisson distribution tells you that about 0.7% of cookies will have zero raisins. If you bake ten thousand cookies a day, that's about 67 raisin-free cookies — enough to generate complaints. The solution? Increase the average. At λ = 8, the probability of a zero-raisin cookie drops to 0.034% — about 3 per ten thousand. This is real industrial mathematics, deployed every day in factories that have never heard of Siméon Denis Poisson.8

· · ·
Chapter 6

The Poisson Process: Events in Time

The Poisson distribution tells you about counts. But its continuous cousin — the Poisson process — tells you about timing. If events arrive according to a Poisson process at rate λ, then the time between consecutive events follows an exponential distribution with mean 1/λ. And this has a remarkable property: memorylessness.

The exponential distribution doesn't "remember" how long you've been waiting. If buses arrive at a Poisson rate of 4 per hour, and you've already been waiting 10 minutes, your expected remaining wait is still 15 minutes — exactly the same as if you'd just arrived. The bus doesn't know you're there. The universe doesn't owe you punctuality because you've been patient.

Poisson Process Simulator

Watch events arrive randomly at your chosen rate. Each dot is an event on a timeline.

2.0
Events 0
Elapsed 0.0s
Observed Rate

This memorylessness is why the Poisson process is the default model for arrivals in queueing theory — the mathematics that designs telephone networks, hospital emergency departments, and the checkout lines at your grocery store. It is also why waiting for a bus feels so frustrating. If arrivals are Poisson, you're just as likely to wait a long time as a short time, no matter how long you've already waited. The average wait for the next bus is always the same.

There's a dark irony here. When bus companies publish schedules — "every 15 minutes" — they're implicitly promising that arrivals are not Poisson. They're promising regularity. But traffic, breakdowns, and driver variability conspire to Poissonify the process. The schedule is a fiction, and the Poisson process is the truth lurking underneath.

The Memoryless Property you expected wait = 1/λ already waited event event event event No matter how long you've waited, the future looks the same.

In a Poisson process, the expected wait for the next event doesn't depend on how long you've already waited. The process has no memory.

· · ·
Chapter 7

When Poisson Fails — and What That Tells You

The Poisson distribution is a null hypothesis of sorts. It's the pattern you expect when events are independent and occur at a constant rate. When your data doesn't fit the Poisson, that's not a failure of the model — it's information.

If the variance of your data is significantly larger than the mean, you have overdispersion — a telltale sign that your events are not independent, or that the rate is not constant. Earthquakes in a region, for instance, tend to cluster (aftershocks follow mainshocks), leading to overdispersion. Disease outbreaks cluster because infection is contagious — one case makes the next more likely. In both cases, the excess variance relative to Poisson tells you something important about the underlying mechanism.

Conversely, if the variance is significantly less than the mean — underdispersion — you have events that are more regular than random. The heartbeats of a healthy person at rest are underdispersed relative to Poisson. So are the eruptions of Old Faithful. These are processes with a refractory period or a clock.

The Dispersion Test

For Poisson data, the mean equals the variance (both equal λ). The ratio variance/mean — called the index of dispersion — should be approximately 1. If it's significantly greater than 1, your events are clustering more than randomness predicts. If it's significantly less than 1, something is spacing them out. Either way, you've learned something the Poisson distribution taught you by breaking.

This is one of the great gifts of the Poisson distribution to practical science. It gives you a precise baseline for "what randomness looks like," so you can detect departures from randomness. Without it, you're just staring at a list of numbers wondering if the pattern is real. With it, you can calculate whether the pattern is real.

Clarke used exactly this logic for the V-2 rockets. If the bombs were targeted at specific neighborhoods, you'd see overdispersion — some squares hit far more than the Poisson predicts, others hit far less. The fact that the data fit the Poisson so well was itself the finding: the Germans weren't targeting. The bombs were falling like rain.

· · ·
Chapter 8

The Modern Poisson

Today the Poisson distribution is so deeply embedded in modern life that it's invisible. Your spam filter uses it. The hospital down the street uses it to staff emergency rooms. Netflix uses it to provision servers. Insurance companies use it to price policies. Epidemiologists used it to detect the early acceleration of COVID-19 cases, when count data in Wuhan began showing overdispersion relative to Poisson — a signal that human-to-human transmission had begun.9

In genetics, the Lander-Waterman model for shotgun DNA sequencing — the technique that powered the Human Genome Project — is built on the Poisson distribution. If you shatter a genome into random fragments and sequence them, the coverage at any given position follows a Poisson distribution. To ensure that 99% of the genome is covered at least once, you need about 7× coverage — a result that falls directly out of the Poisson formula: P(0) = e−7 ≈ 0.09%.10

And in machine learning, Poisson regression is the workhorse model for count data: predicting the number of insurance claims, the number of defects in manufacturing, the number of species in an ecological survey. It's the right model whenever your outcome is a count — not continuous, not binary, but 0, 1, 2, 3… — and the Poisson assumption gives you a natural link between your predictors and the thing you're predicting.

Bortkiewicz would be amazed. He counted dead cavalrymen and found a universal law. That law now runs through every fiber of our quantitative infrastructure — silent, reliable, and as elegant as the day Poisson first derived it, staring at jury verdicts in a Parisian study.

The lesson, as always with mathematics, is that abstraction is not the enemy of reality. It's the lens. Bortkiewicz didn't need to understand anything about horses to predict their lethality. He just needed to count, and to trust the formula. The Poisson distribution doesn't care what is happening. It only cares how often.

Tell me the rate, and I will tell you the shape of randomness.

Notes & References

  1. Bortkiewicz, L. (1898). Das Gesetz der kleinen Zahlen. Leipzig: B.G. Teubner. The horse-kick data is from Table I, covering 14 Prussian army corps over 20 years (1875–1894), yielding 280 corps-years and 196 total deaths — an average of 0.70 per corps-year.
  2. Poisson, S.D. (1837). Recherches sur la probabilité des jugements en matière criminelle et en matière civile. Paris: Bachelier. The distribution appears in the final chapter as a limiting case of the binomial.
  3. This is the "Poisson limit theorem." See Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. I, 3rd ed. New York: Wiley, §6.3.
  4. Erlang, A.K. (1909). "The Theory of Probabilities and Telephone Conversations." Nyt Tidsskrift for Matematik B, 20, 33–39. This foundational paper launched queueing theory.
  5. Clarke, R.D. (1946). "An Application of the Poisson Distribution." Journal of the Institute of Actuaries, 72(3), 481. The south London analysis covered 576 squares of 0.25 km² each, with 535 V-2 hits.
  6. By the central limit theorem, since a Poisson(λ) random variable is the sum of λ independent Poisson(1) variables. The approximation is generally good for λ > 20.
  7. Dixon, M.J. and Coles, S.G. (1997). "Modelling Association Football Scores and Inefficiencies in the Football Betting Market." Journal of the Royal Statistical Society: Series C, 46(2), 265–280. Earlier work by Moroney (1951) and Reep & Benjamin (1968) also noted the Poisson pattern.
  8. This is a standard example in quality control textbooks. See Montgomery, D.C. (2019). Introduction to Statistical Quality Control, 8th ed. New York: Wiley.
  9. Li, Q. et al. (2020). "Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia." New England Journal of Medicine, 382, 1199–1207.
  10. Lander, E.S. and Waterman, M.S. (1988). "Genomic Mapping by Fingerprinting Random Clones." Genomics, 2(3), 231–239.