Benford's Law — Numbers Have Fingerprints

Chapter 1

The Dirty Pages

Open a book of logarithm tables — if you can find one, which in 1881 was like saying "open a web browser" — and flip through it. Go ahead. I'll wait.

Simon Newcomb did exactly this in his observatory at the United States Naval Almanac Office, and he noticed something that nobody had any business noticing: the first pages were filthy. Worn, smudged, soft at the corners. The pages listing logarithms of numbers starting with 1 had been thumbed through so often they were practically translucent. The pages for 8 and 9? Crisp as the day they were printed.1

This is the kind of observation most people would file under "huh, weird" and then go back to calculating the orbit of Neptune. But Newcomb was not most people. He was the kind of person who, upon noticing uneven page wear in a reference book, sat down and derived a mathematical law.

The first significant digit is oftener 1 than any other digit, and the frequency diminishes up to 9.

He published a two-page note in the American Journal of Mathematics proposing that the probability of a number's first digit being d was not 1/9 — not equal for all digits — but followed a logarithmic distribution. It was a genuine mathematical insight, published in a prestigious journal, by a serious astronomer.

And then everybody forgot about it for fifty-seven years.

Newcomb's 1881 paper is one of the great what-ifs in the history of mathematics. Two pages. Correct formula. Zero citations for half a century. Science is like that sometimes: you can be right, and early, and published, and still invisible. The idea needed a second discoverer — someone with more data and, frankly, a better name for the law.

Newcomb's log table book: the pages for numbers starting with 1 were worn and stained from heavy use. The pages for 8 and 9 looked barely touched.

· · ·

Chapter 2

The Physicist Who Counted Everything

Enter Frank Benford, a physicist at General Electric, who in 1938 independently made the same observation Newcomb had — and then did what physicists do when they suspect a pattern: he tested it against everything.2

Benford collected over 20,000 observations from 20 different datasets: areas of rivers, population figures, street addresses, numbers from newspaper front pages, molecular weights, death rates, baseball statistics. An ecumenical survey of numbers as they exist in the wild. He wasn't picky. He was thorough.

The result was almost spooky. About 30.1% of all first digits were 1. About 17.6% were 2. The digit 9 showed up first only 4.6% of the time. This wasn't a quirk of one dataset — it was a pattern woven into the fabric of how numbers behave in nature.

Key Insight

If numbers were "random" the way most people imagine randomness — uniformly distributed — each digit from 1 to 9 would appear as a leading digit about 11.1% of the time. Benford's Law says the real world is nothing like that. The digit 1 leads three times as often as you'd naively expect.

Benford also gave the law its stickiest feature: his name. Newcomb had priority by 57 years, but Benford had data — twenty datasets' worth — and in science, data talks louder than priority. Thus does a physicist at a lightbulb company become the namesake of one of mathematics' most elegant surprises.3

· · ·

Chapter 3

The Formula

The law itself is one line. The probability that the first digit of a number is d:

Benford's Law

P(d) = log₁₀(1 + 1/d)

where d ∈ {1, 2, 3, …, 9}

Plug in the numbers and you get a staircase descending from about 30.1% down to 4.6%:

d=1: 30.1% · d=2: 17.6% · d=3: 12.5% · d=4: 9.7% · d=5: 7.9%
d=6: 6.7% · d=7: 5.8% · d=8: 5.1% · d=9: 4.6%

But why logarithms? Here's the intuition that makes everything click. Think about what it takes for a number's first digit to change. To go from a leading 1 to a leading 2, a number has to double — from 1,000 to 2,000, say. That's a 100% increase. But to go from a leading 8 to a leading 9? That's only a 12.5% increase. And from 9 back to 1 (meaning the number crosses from 9,999 to 10,000)? Just 11%.

The digit 1 gets more "territory" on the number line because it takes longer to traverse. On a logarithmic scale — where equal distances represent equal ratios rather than equal differences — the interval from 1 to 2 is genuinely, physically wider than the interval from 8 to 9.

On a linear scale, each digit gets equal space. On a logarithmic scale — where the real world lives — the digit 1 occupies over six times the territory of 9.

The universe counts logarithmically. We just pretend it counts linearly because it's easier.

· · ·

Chapter 4

Scale Invariance, or Why Numbers Don't Know What Units They're In

Here's a thought experiment that should bother you. Take a table of country populations. The first-digit distribution follows Benford's Law. Now convert every population from people into thousands of people — divide everything by 1,000. Does the first-digit distribution change?

It shouldn't. And it doesn't. If a genuine law about first digits exists, it must be invariant under multiplication by a constant. River lengths don't know whether you're measuring in miles or kilometers. Your bank balance doesn't care whether it's denominated in dollars or yen.

In 1961, the mathematician Roger Pinkham proved that Benford's distribution is the only distribution on first digits that's invariant under scaling.4 Multiply every number in a Benford-distributed set by any constant, and the first digits still follow Benford's Law. This isn't a coincidence — it's a mathematical necessity for any dataset that spans multiple orders of magnitude and doesn't care about units.

Scale Invariance

If first-digit frequencies don't change when you switch from dollars to euros, from miles to kilometers, from kilobytes to megabytes — then those frequencies must follow the Benford distribution. It's the only possibility. The law isn't empirical. It's logical.

· · ·

Chapter 5

Why It Works: Multiplication All the Way Down

The deepest explanation for Benford's Law comes from thinking about how numbers grow. Most quantities in the real world grow multiplicatively. A city doesn't add 10,000 residents per year — it grows by 2%. A stock doesn't rise $1 a day — it rises 0.5%. River lengths, molecular weights, financial figures — they're all products of many compounding factors.

When you multiply lots of random numbers together, you get a lognormal distribution — bell-curved on a logarithmic scale. And any distribution spread across several orders of magnitude on a log scale will, almost inevitably, produce first digits that follow Benford's Law.5

Start with a penny. Each day, multiply it by some random factor between 0.8 and 1.5. After a hundred days, what's the leading digit? Run this experiment a million times and the distribution of leading digits converges to Benford's Law. The multiplicative process is a random walk on a logarithmic number line — and a random walk on the log scale gives you uniform distribution on the log scale, which is Benford's Law.

There's a beautiful theorem by Theodore Hill (1995) that makes this even more general: if you pick probability distributions "at random" and then pick numbers from those distributions, the combined dataset follows Benford's Law.6 It's a central limit theorem for first digits. Mix enough different sources, and Benford emerges like statistical gravity.

· · ·

Chapter 6

Catching the Cheaters

Here's where mathematics meets the crime beat. If naturally occurring numbers follow Benford's Law, then fabricated numbers usually don't. Humans are terrible at faking randomness. When someone invents financial figures off the top of their head, they spread first digits too evenly — too many 5s, 6s, 7s, 8s, and 9s, not enough 1s and 2s. The numbers feel "too random." And that's a red flag.

In 2001, as Enron's fraud unraveled, forensic accountants applied Benford's Law to the company's reported figures. The deviations were glaring — particularly in payments data, which showed clear signs of human fabrication. Benford's Law didn't prove fraud by itself, but it told investigators exactly where to look.7

Today, Benford's Law is a standard tool in forensic accounting. "Digital analysis" of first digits — and second digits, and first-two-digit combinations — screens tax returns, corporate filings, and expense reports. Several countries' tax authorities use it as a first-pass filter for suspicious returns.8

Election forensics is a more controversial application. After the 2009 Iranian election, statisticians noted deviations from Benford's Law in certain vote tallies. But — and this is important — Benford's Law doesn't apply cleanly to all election data. Precinct-level vote counts are often constrained to a narrow range (a few hundred to a few thousand), which can naturally violate the distribution. The lesson: Benford's Law is a useful diagnostic, not a lie detector. Context matters enormously.9

Natural datasets produce the distinctive Benford staircase (left). When humans fabricate numbers, they spread digits too evenly (right) — a telltale sign of fraud.

Benford's Law doesn't catch liars. It catches people who don't know how truthful numbers are supposed to look.

· · ·

Chapter 7

When It Doesn't Work (and Why That Matters Too)

Benford's Law is not magic. Understanding when it fails is just as important as knowing when it works.

The law tends to hold when:

Data spans several orders of magnitude
Data arises from multiplicative or exponential processes
Data isn't constrained to a narrow range
There are no artificial cutoffs or caps

It tends to fail for:

Assigned numbers — phone numbers, social security numbers, zip codes. These are identifiers, not measurements.
Constrained ranges — human heights in centimeters (150–200) will have mostly 1s as leading digits, but for the wrong reason.
Uniform distributions — lottery numbers, for instance.
Small datasets — you need hundreds of observations at minimum for the pattern to emerge reliably.

The Fundamental Statement

A dataset follows Benford's Law if and only if its logarithms are uniformly distributed modulo 1. This is the mathematical core. Everything else — scale invariance, multiplicative processes, power laws — is just a different window into this single underlying truth.

· · ·

Benford's Law Laboratory

Test real datasets, generate samples, or paste your own numbers to see if they carry the Benford fingerprint — or if someone might have made them up.

· · ·

Benford's Law sits at the intersection of pure mathematics and the grubby empirical world. It says something profound: numbers in nature are not uniformly random. They carry structure — a fingerprint left by the logarithmic geometry of multiplication. And when humans try to invent numbers from scratch, they miss this structure, because our intuition about "randomness" is linear, and the world's randomness is logarithmic.

Simon Newcomb noticed it in 1881. Frank Benford proved it in 1938. Roger Pinkham explained it in 1961. Ted Hill generalized it in 1995. And right now, somewhere in the world, a forensic accountant is running a first-digit analysis on a set of expense reports — and the numbers are about to rat out their author.

The next time you see a table of numbers — financial results, scientific data, census figures — remember: those numbers have a signature. And if the signature is wrong, someone has something to hide.

Notes

Simon Newcomb, "Note on the Frequency of Use of the Different Digits in Natural Numbers," American Journal of Mathematics, Vol. 4, No. 1 (1881), pp. 39–40.
Frank Benford, "The Law of Anomalous Numbers," Proceedings of the American Philosophical Society, Vol. 78, No. 4 (1938), pp. 551–572.
Stigler's Law of Eponymy states that no scientific discovery is named after its original discoverer. Benford's Law is itself a beautiful example of Stigler's Law.
Roger Pinkham, "On the Distribution of First Significant Digits," The Annals of Mathematical Statistics, Vol. 32, No. 4 (1961), pp. 1223–1230.
For an accessible treatment, see Steven J. Miller (ed.), Benford's Law: Theory and Applications, Princeton University Press, 2015.
Theodore P. Hill, "A Statistical Derivation of the Significant-Digit Law," Statistical Science, Vol. 10, No. 4 (1995), pp. 354–363.
Mark Nigrini, Benford's Law: Applications for Forensic Accounting, Auditing, and Fraud Detection, Wiley, 2012.
Durtschi, Hillison, and Pacini, "The Effective Use of Benford's Law to Assist in Detecting Fraud in Accounting Data," Journal of Forensic Accounting, Vol. 5 (2004), pp. 17–34.
Walter Mebane Jr., "Note on the presidential election in Iran, June 2009," working paper, University of Michigan, 2009. For a skeptical view, see Deckert, Myagkov, and Ordeshook, "Benford's Law and the Detection of Election Fraud," Political Analysis, Vol. 19, No. 3 (2011), pp. 245–268.