All Chapters

The Missing Chapter

Preferential Attachment

Why the rich get richer, the famous get more famous, and your airport has too many connections

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 42

The Gospel According to Matthew (and Merton)

For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath. — Matthew 25:29

In 1968, the sociologist Robert K. Merton noticed something peculiar about how science works. When two scientists independently discover the same thing, the one who's already famous gets the credit. A Nobel laureate publishes a mediocre paper? It gets cited everywhere. A postdoc at a no-name university publishes a brilliant one? Crickets.1

Merton called this the Matthew Effect, after the ruthlessly capitalist parable in the Gospel of Matthew. The verse is essentially Jesus explaining that the universe operates on a positive feedback loop: the rich get richer, and the poor get poorer. Which, whatever your feelings about the theological implications, turns out to be an astonishingly accurate description of how networks grow.

Think about citations. A paper with 100 citations is sitting in 100 reference lists. Every new researcher who reads any of those 100 papers might follow the citation and find it. A paper with 10 citations has one-tenth the exposure. So the well-cited paper accumulates new citations faster — not because it's necessarily ten times better, but because it's ten times more visible.

This sounds like it might just be an amusing sociological observation. It's not. It's a mathematical law so fundamental that it explains everything from the structure of the internet to the frequency of English words to the reason your flight probably connects through Atlanta.

· · ·

The Urn That Remembers

Before we get to networks, let's start with something simpler: an urn full of colored balls.2

Imagine an urn containing one red ball and one blue ball. You draw a ball at random, note its color, then put it back along with a new ball of the same color. Draw again. Repeat.

+1 Draw red → add another red
The Pólya urn: draw a ball, return it with a twin. Early luck compounds.

This is the Pólya urn model, invented by George Pólya in 1923, and it's the simplest machine that produces the rich-get-richer effect. If you happen to draw red first, there are now 2 reds and 1 blue — red has a ⅔ chance of being drawn next. If red wins again, it's 3 reds to 1 blue. Early luck amplifies.

The beautiful mathematical result: the long-run proportion of red balls converges to some value, but that value is uniformly distributed on [0, 1]. The urn "locks in" to a random ratio, and no amount of future draws can undo the early advantage. The past echoes forward forever.3

The Pólya urn was actually preceded by an even earlier version of the same idea. In 1925, the statistician G. Udny Yule published a model explaining why some biological genera contain vastly more species than others. His "Yule process" was the first formal mathematical treatment of preferential attachment — thirty years before anyone started thinking about networks.4

· · ·

Scale-Free Networks, or: Why the Internet Has Hubs

Fast-forward to 1999. Albert-László Barabási and Réka Albert were mapping the World Wide Web — then still a somewhat novel thing — and noticed that its link structure looked nothing like what random network theory predicted.5

If websites formed links randomly, you'd expect a bell-curve distribution of connections: most sites would have roughly the same number of links, give or take. Instead, they found a power law: a tiny number of sites (Yahoo, Google, Amazon) had millions of links pointing at them, while the vast majority had almost none.

Power-law degree distribution

P(k) kγ

P(k)
probability a node has exactly k connections
γ
power-law exponent, typically between 2 and 3

On a log-log plot, this is a straight line with slope −γ. The hallmark of a scale-free network.

Their explanation was elegantly simple. Networks don't spring into existence fully formed — they grow. And new nodes don't connect randomly. They preferentially attach to nodes that are already well-connected. A new website is more likely to link to Google than to your cousin's blog about ferrets, because Google is more visible, more useful, more there.

Formally: when a new node enters the network, it connects to existing node i with probability proportional to node i's degree. If node A has 1,000 connections and node B has 10, then A is 100 times more likely to attract the newcomer. That's it. That's the whole model. And from this single rule, power-law distributions emerge with mathematical inevitability.

Random Network Everyone roughly equal Scale-Free Network A few hubs dominate
Random networks are egalitarian. Preferential attachment creates aristocracies.

The same pattern shows up everywhere you look. Airports: a handful of hubs (Atlanta, Chicago O'Hare, Dubai) handle a disproportionate share of flights. Social media: a few influencers have millions of followers while the median account has a handful. Word frequency: "the" appears about 7% of the time in English text, while most words appear almost never — that's Zipf's law, and it's a power law too.6

The key insight

Preferential attachment doesn't require anyone to be strategic or conspiratorial. It's the default outcome of growth plus visibility. No one decided that Atlanta should be the world's busiest airport. It just started getting connections, and each connection made the next one more likely.

· · ·

Watch It Happen

Theory is lovely, but let's watch preferential attachment in action. The simulation below grows a network one node at a time. Each new node makes two connections. In preferential mode, connections go to high-degree nodes with higher probability. In random mode, every existing node has equal odds. Watch how quickly hubs form in the first case — and how they don't in the second.

Network Growth Simulator

Watch a network grow under different attachment rules. Node size reflects degree.

Nodes: 3
Edges: 2
Max degree: 2
Mode: Preferential
· · ·

Reading the Fingerprint

How do you tell if a network was built by preferential attachment? You look at its degree distribution. Grow a network, count how many nodes have 1 connection, how many have 2, how many have 3, and so on. Then plot this on logarithmic axes — log of degree on the x-axis, log of frequency on the y-axis.

If the network was built by preferential attachment, you'll see something close to a straight line — the signature of a power law. If it was built randomly, you'll see a curve that drops off exponentially: there are no mega-hubs, because there's no mechanism to create them.

Degree Distribution Analyzer

Grow networks of 200 nodes under each mode, then compare their degree distributions on log-log axes.

Preferential attachment
Random attachment
· · ·

But Google Wasn't Just Lucky

There's a problem with the pure Barabási-Albert model, and it's an important one: it gives too much credit to first-mover advantage. In the basic model, early nodes have an almost insurmountable head start. The first node in the network will, on average, always be the most connected.

But that's not how the real world works. Google wasn't the first search engine — it came after AltaVista, Lycos, Excite, and a graveyard of others. Facebook wasn't the first social network (RIP Friendster, MySpace). These late arrivals didn't just succeed despite their tardiness — they dominated.

In 2001, Ginestra Bianconi and Barabási introduced a crucial refinement: fitness.7 In their model, each node gets a random "fitness" parameter η (eta) when it's born. The probability of connecting to node i is now proportional to both its degree and its fitness:

Bianconi-Barabási fitness model

Π(i) = ηi · ki / Σj ηj · kj

ηi
intrinsic fitness of node i
ki
current degree of node i

A high-fitness latecomer can overtake an early low-fitness node. Google beats AltaVista.

This is a much more satisfying model. First-mover advantage is real — getting in early genuinely helps. But it's not destiny. A node with exceptional fitness can overcome its late start because every connection it makes is amplified by that fitness factor. The rich get richer, yes, but the talented rich get richer faster.

Bianconi and Barabási made a startling connection: the mathematics of their fitness model are identical to Bose-Einstein condensation in quantum mechanics — the phenomenon where particles crowd into the same quantum state at ultra-low temperatures. In the network analogy, a "Bose-Einstein condensation" event occurs when a single super-fit node captures a finite fraction of all links. A winner-take-all phase transition.8

The network doesn't care about fairness. It cares about visibility times quality — and both compound.
· · ·

What This Means for You

Preferential attachment isn't just a curiosity for network scientists. It's a framework for understanding inequality — and for thinking clearly about what to do about it.

Consider academic hiring. A PhD student at a top-ranked department gets better-known advisors, more conference invitations, more visibility — and therefore more citations, more job offers, more students of their own. The feedback loop locks in the advantage of the initial placement. Is the Harvard-trained scientist genuinely better? Maybe. But the Pólya urn model whispers: maybe they just drew red first.

Or consider the economics of attention online. A tweet that gets retweeted once is twice as likely to be seen. Seen twice as much means retweeted again. The dynamics are pure preferential attachment. Virality isn't a description of content quality — it's a description of network topology.

The Long Tail Hubs The long tail Connections
Power laws produce extreme inequality: a few nodes get almost everything.

The good news, from the fitness model: being late doesn't mean being doomed. Quality matters. Google's PageRank algorithm was genuinely better at search — that's fitness. Fitness can overcome a late start, even in a world rigged toward the early and already-large.

The sobering news: fitness alone isn't enough, because the feedback loops amplify every small advantage. Two products of nearly equal quality will not end up with nearly equal market share. The one that gets a small early edge — through luck, timing, marketing, whatever — can ride the preferential attachment wave to dominance while its equally-good competitor stagnates.

This is why antitrust regulators should understand network science. It's why "just build a better product" is necessary but not sufficient advice. And it's why the Pólya urn, a mathematical toy from 1923, has something profound to say about the world we live in: in systems with positive feedback, small initial differences become enormous final ones. Not because anyone planned it. Not because it's fair. Because that's just how the math works.

The urn doesn't care which ball you draw first. But the network remembers forever.

Notes & References

  1. Merton, R. K. (1968). "The Matthew Effect in Science." Science, 159(3810), 56–63. Merton documented how eminent scientists get disproportionate credit for work of comparable quality.
  2. Pólya, G. (1931). "Sur quelques points de la théorie des probabilités." Annales de l'Institut Henri Poincaré, 1(2), 117–161. The urn model that launched a thousand dissertations.
  3. The limiting proportion follows a Beta(1,1) distribution — i.e., Uniform[0,1] — when starting with one ball of each color. More generally, starting with a red and b blue yields a Beta(a, b) limit.
  4. Yule, G. U. (1925). "A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S." Philosophical Transactions of the Royal Society B, 213, 21–87.
  5. Barabási, A.-L. & Albert, R. (1999). "Emergence of Scaling in Random Networks." Science, 286(5439), 509–512. The paper that launched network science as a field.
  6. Zipf's law states that the frequency of a word is inversely proportional to its rank. The nth most common word appears roughly 1/n as often as the most common. Herbert Simon showed in 1955 that preferential attachment (he called it "Gibrat's principle") explains this distribution.
  7. Bianconi, G. & Barabási, A.-L. (2001). "Competition and multiscaling in evolving networks." Europhysics Letters, 54(4), 436–442.
  8. The analogy is precise: assign each node an "energy" ε = −ln(η). The partition function of the network maps exactly onto a Bose gas. When the fitness distribution is right, you get condensation — one node captures a macroscopic fraction of all edges.