The Prisoner's Dilemma — The Missing Chapters

Chapter 24

A Game Nobody Wins

In the summer of 1950, two mathematicians at the RAND Corporation played a little game. The game was simple. The results were devastating.

Merrill Flood and Melvin Dresher weren't trying to change the world. They were trying to test a theory — John Nash's brand-new proof that every finite game has an equilibrium, a stable point where no player can improve by unilaterally switching strategies. The math said it existed. But would real humans actually play the equilibrium?1

They sat two people across from each other and gave them a choice: cooperate or defect. If both cooperate, both get a decent reward. If both defect, both get almost nothing. But if you defect while the other person cooperates? You get the jackpot, and they get the worst possible outcome.

The rational move — the Nash equilibrium — was to defect. Always defect. And yet, in a hundred rounds of play, Flood and Dresher's subjects cooperated far more often than the theory predicted. The humans were being irrational. Or were they being something smarter?

• • •

The game didn't have a catchy name yet. That came a few months later, when Albert Tucker — Nash's doctoral advisor at Princeton — was asked to give a talk to some psychologists. Psychologists, Tucker knew, didn't care about payoff matrices. They cared about stories.

Two suspects are arrested and held in separate cells. The prosecutor offers each the same deal: testify against your partner (defect) and go free while they serve three years. But if both testify against each other, both serve two years. And if neither talks (both cooperate), the prosecution can only prove a minor charge — six months each.

You can't talk to your partner. You don't know what they'll do. What do you choose?

Tucker's framing was a stroke of genius. Suddenly everyone could feel the dilemma in their gut. And the name stuck: the Prisoner's Dilemma.2

The Payoff Matrix

Let's put numbers on it. The standard version uses years in prison (lower is better for each prisoner), but game theorists usually flip to rewards (higher is better). Here's the canonical version:

	Partner Cooperates	Partner Defects
You Cooperate	3, 3	0, 5
You Defect	5, 0	1, 1

Each cell shows (Your payoff, Their payoff). Mutual cooperation beats mutual defection — but defection always tempts.

Here's the brutal logic. Whatever your partner does, you are better off defecting. If they cooperate, you get 5 instead of 3 by defecting. If they defect, you get 1 instead of 0 by defecting. Defection dominates. It's the rational choice no matter what.

But when both players follow this impeccable logic, they each get 1 point. If they'd both been "irrational" enough to cooperate, they'd each have gotten 3. The invisible hand, in this case, slaps everyone in the face.

"The rational thing to do is the thing that makes everyone worse off. That's not a paradox in the math — it's a paradox in our notion of rationality."

Total welfare is highest when both cooperate (6 points shared) and lowest when both defect (2). But each individual is always tempted to defect.

This is why the Prisoner's Dilemma matters far beyond game theory classrooms. It's the formal skeleton of every situation where individual incentives pull against collective welfare. Climate change. Arms races. Doping in sports. Antibiotic overuse. The tragedy of the commons wears this exact mathematical outfit.3

Chapter 24

Play the Dilemma

Enough theory — let's feel it. Below, you'll play repeated rounds of the Prisoner's Dilemma against an AI opponent. Choose a strategy to face, then cooperate or defect each round. Watch the scores pile up. After 20 rounds, we'll reveal what strategy you were playing against.

Prisoner's Dilemma — Live

Choose cooperate or defect each round. Can you figure out your opponent's strategy?

Opponent:

You: 0 Them: 0 Round: 0/20

#	You	Opponent	Your +	Their +

• • •

Chapter 24

The Shadow of the Future

In a one-shot Prisoner's Dilemma, defection is unassailable. But life isn't one-shot. You see your neighbors again. You buy from the same store. Nations negotiate treaty after treaty. When the game repeats, everything changes.

The key insight is what game theorists call the shadow of the future. If I know I'll play against you tomorrow, and the day after, and the day after that, then screwing you today costs me something: your future cooperation. Suddenly the cold calculus shifts. The future casts its shadow backward onto the present, making cooperation rational — not out of altruism, but out of enlightened self-interest.4

But which strategy works best in the iterated game? In 1980, the political scientist Robert Axelrod decided to find out in the most dramatic way possible: he held a tournament.

Axelrod's Tournament

Axelrod invited game theorists, mathematicians, economists, psychologists, and computer scientists to submit strategies for the iterated Prisoner's Dilemma. Each strategy would play 200 rounds against every other strategy, and the one with the highest total score would win.

He got 14 entries, ranging from simple to deviously complex. Some tried to detect patterns and exploit them. Some used elaborate probabilistic models. Some were ruthlessly aggressive.

The winner? The simplest strategy in the entire tournament. Four lines of code.

Round 1: Cooperate.

Every round after: Do whatever the opponent did last round.

That's it. Submitted by Anatol Rapoport, a mathematical psychologist at the University of Toronto.

Axelrod was stunned. He ran a second tournament, this time with 63 entries — everyone knew Tit-for-Tat had won the first round, so they could specifically try to beat it. Tit-for-Tat won again.5

Why? Axelrod identified four properties that made it unbeatable:

The Four Virtues of Tit-for-Tat

Nice — it never defects first. It starts with a handshake, not a fist.

Retaliatory — it punishes defection immediately. You can't exploit it without consequences.

Forgiving — one round of punishment, then back to cooperation. It doesn't hold grudges.

Clear — opponents quickly figure out what it's doing. There's no mystery, no mind games. Clarity builds trust.

The complex strategies defeated themselves. The aggressive ones picked fights they couldn't win. The doormat strategies got exploited. Tit-for-Tat threaded the needle: firm but fair, simple but effective. It's the mathematical vindication of the Golden Rule — with teeth.

Tit-for-Tat mirrors the opponent's last move. When the opponent defects in R3, TfT retaliates in R4 — then forgives immediately when cooperation resumes.

Chapter 24

The Tournament

Now let's watch it happen. The simulator below runs a full round-robin tournament: every strategy plays every other strategy (and itself) over 200 rounds. You can add noise — a probability that any intended move accidentally flips — to see what happens in a world where mistakes are possible. (Spoiler: Tit-for-Tat's one weakness is noise. Two Tit-for-Tat players who hit one accidental defection can spiral into an alternating vendetta. The variant Tit-for-Two-Tats, which forgives one defection, does better in noisy environments.)

Strategy Tournament

Round-robin: 200 rounds per matchup. Adjust noise to simulate mistakes.

Noise (mistake probability)

Rounds per matchup

200

• • •

Chapter 24

The Real World Is an Iterated Game

The Prisoner's Dilemma isn't a curiosity locked inside textbooks. It's hiding in plain sight everywhere you look.

The Cold War arms race was the dilemma writ in megatons. Both the US and USSR would have been better off disarming (mutual cooperation). But each feared being the sucker — disarming while the other stockpiled. So both built enough nuclear weapons to destroy the world several times over. Mutual defection. Cost: trillions of dollars and the constant possibility of annihilation.6

Climate negotiations have the same structure. Every country benefits from a stable climate (cooperation), but cutting emissions is costly, and any single country's cuts are swamped by global totals. The temptation to free-ride — to keep burning coal while everyone else sacrifices — is enormous. The Paris Agreement is essentially a mechanism for turning a one-shot dilemma into an iterated game with monitoring and reputation.

Price wars between airlines follow the script precisely. If all airlines keep prices high, everyone profits handsomely. But any single airline can grab market share by slashing prices. When everyone slashes, nobody wins — except the passengers, who are, in this framing, the environment rather than the players.

Open-source software is the beautiful counterexample. Thousands of developers contribute code for free (cooperation) when the "rational" move would be to use everyone else's work without giving back (defection). It works because the open-source community is a deeply iterated game with visible reputations. Your GitHub profile is your cooperation history, and the community remembers.7

The dilemma shows up everywhere. The top row: trapped in mutual defection. The bottom: cooperation sustained through repetition and reputation.

Chapter 24

The Evolution of Cooperation

The deepest question the Prisoner's Dilemma raises isn't about strategy — it's about existence. In a world where defection is rational, how does cooperation exist at all? Why aren't we all backstabbing loners?

The biologist Martin Nowak spent decades on this question and identified five distinct mechanisms by which natural selection can favor cooperation over defection:8

Nowak's Five Rules for Cooperation

1. Direct reciprocity — "I'll help you because you'll help me next time." This is the iterated game. It works when the probability of meeting again is high enough.

2. Indirect reciprocity — "I'll help you because others are watching, and helping you builds my reputation." This is why we tip in restaurants we'll never visit again — other people see us.

3. Spatial selection — Cooperators who cluster together can outcompete defectors. In a grid world, cooperators form islands that survive. Think of neighborhoods, communities, online forums.

4. Group selection — Groups with more cooperators outperform groups with more defectors. Even if defectors win within a group, cooperative groups win between groups.

5. Kin selection — Hamilton's rule: help your relatives because they share your genes. "I would lay down my life for two brothers or eight cousins," as J.B.S. Haldane (maybe) said.

These five mechanisms are not just theoretical curiosities. They're the scaffolding on which all of human civilization is built. Every institution — from marriage to markets, from churches to constitutions — is a mechanism for sustaining cooperation in the face of the ever-present temptation to defect.

The Prisoner's Dilemma tells us that cooperation is not the default. It's an achievement. It requires structure: repetition, reputation, community, kinship, or institutional enforcement. Remove those scaffolds and cooperation collapses — as anyone who's watched an anonymous online comment section devolve can attest.

• • •

Back in that RAND Corporation office in 1950, Flood and Dresher's subjects were, in some sense, wiser than the theory. They cooperated because they felt the iterated structure even in what was supposed to be an isolated experiment. They brought the shadow of the future with them — the habits of reciprocity learned over a lifetime of human interaction.

The math says defect. The math is correct. But the math is modeling the wrong game. Real life almost never presents you with a truly one-shot, truly anonymous dilemma. And in the game that life actually is — repeated, observed, remembered — the math says something different. It says: be nice, be retaliatory, be forgiving, and be clear.

Tit-for-Tat. The simplest program wins the most complex game. There's a lesson in that, and it has nothing to do with prisoners.

Notes & References

Flood, M. M. (1958). "Some Experimental Games." Management Science, 5(1), 5–26. The original RAND experiments are described in RAND Research Memorandum RM-789-1 (1952).
Tucker never published his formulation. The story-framing appears in Poundstone, W. (1992). Prisoner's Dilemma: John von Neumann, Game Theory, and the Puzzle of the Bomb. Anchor Books.
Hardin, G. (1968). "The Tragedy of the Commons." Science, 162(3859), 1243–1248. Hardin's commons dilemma is structurally equivalent to an n-player Prisoner's Dilemma.
Axelrod, R. (1984). The Evolution of Cooperation. Basic Books. The "shadow of the future" concept is formalized via the discount parameter δ in the iterated game.
Axelrod, R. (1980). "Effective Choice in the Prisoner's Dilemma." Journal of Conflict Resolution, 24(1), 3–25. Rapoport's Tit-for-Tat won both the 1980 and 1984 tournaments.
Schelling, T. C. (1960). The Strategy of Conflict. Harvard University Press. Schelling's analysis of nuclear strategy is deeply informed by the Prisoner's Dilemma framework.
Raymond, E. S. (1999). The Cathedral and the Bazaar. O'Reilly Media. Open-source collaboration as an iterated cooperation game is explored in Benkler, Y. (2006). The Wealth of Networks.
Nowak, M. A. (2006). "Five Rules for the Evolution of Cooperation." Science, 314(5805), 1560–1563.