Newcomb's Problem — The Missing Chapters

Chapter 28

The Problem That Won't Go Away

In 1969, the philosopher Robert Nozick did something unusual. He presented a simple thought experiment to a room full of brilliant people — and watched them tear each other apart.

The setup is almost childishly simple. There are two boxes in front of you. Box A is transparent and contains $1,000. Box B is opaque. You can either take both boxes, or take only Box B. That's it. That's the whole decision.

The catch — and there's always a catch — is that a supremely accurate predictor called Omega has already made a prediction about what you'll do. If Omega predicted you'd take only Box B, it placed $1,000,000 inside. If Omega predicted you'd take both boxes, it left Box B empty.1

Omega has done this thousands of times before. Its track record is extraordinary — say, 99% accurate. The prediction has already been made. The money is already in the boxes or not. Now you choose.

So: do you take one box, or two?

The setup: Box A always has $1,000. Box B has $1M or nothing — decided before you choose.

The Case for Two Boxes

The two-boxer's argument is crisp and logically airtight. At the moment you choose, the money in Box B is either there or it isn't. Omega made its prediction yesterday. Your choice now can't retroactively change what's in the box. So consider your two scenarios:

If Box B contains $1,000,000: taking both gets you $1,001,000. Taking only B gets you $1,000,000. Both is better by $1,000.

If Box B contains nothing: taking both gets you $1,000. Taking only B gets you $0. Both is better by $1,000.

No matter what, taking both boxes gets you an extra $1,000. This is the dominance principle — one of the bedrock axioms of rational choice theory. If strategy A beats strategy B in every possible state of the world, you pick strategy A. Full stop. You'd have to be crazy to leave free money on the table.2

The two-boxer's argument has the satisfying click of a well-machined lock. It doesn't depend on Omega's accuracy, on your psychology, on anything. It's a logical truth: more is more. Taking both boxes strictly dominates taking one box. Case closed.

The Case for One Box

The one-boxer looks at the two-boxer with pity. "Yes, yes, very clever," they say. "Now look at the scoreboard."

People who take one box walk away with $1,000,000 almost every time. People who take both boxes walk away with $1,000 almost every time. Omega is right 99% of the time, remember? If you're the kind of person who takes both boxes, Omega almost certainly predicted that, and Box B is empty. You get your precious $1,000. Congratulations.

The one-boxer's logic is simple: be the kind of person Omega predicts will take one box. Those people are millionaires. The others are holding a thousand bucks and a philosophy degree.3

Here's the cruelty of the situation. The two-boxer is right that their choice can't change the contents of Box B. The money is there or it isn't. But the one-boxer is also right that one-boxers get rich and two-boxers don't. Both arguments are logically valid. Both conclusions can't be the best action. Something in our framework is broken — and five decades of brilliant philosophers have not been able to agree on what.

"I have put this problem to a large number of people… To almost everyone it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly."
— Robert Nozick, 1969

· · ·

Chapter 28.1

Play the Game

Theory is nice. Let's get empirical. Below you can play Newcomb's Problem against a simulated predictor. Adjust its accuracy and see what happens over many rounds. If the dominance argument is right, two-boxing should always be better. If the one-boxers are right, the predictor's accuracy should matter — a lot.

🎲 Newcomb's Box Game

Predictor accuracy: 90.0%

Box A

$1,000

Box B

???

Make your choice!

Rounds

Total Earned

Avg / Round

One-Box Wins

Two-Box Wins

Try this

Play 20+ rounds with each strategy. Then slide the accuracy down to 50% (coin-flip predictor) and try again. Notice when the two-boxer starts winning? That's the crux: the argument for one-boxing depends entirely on the predictor being good. The argument for two-boxing doesn't depend on the predictor at all.

· · ·

Chapter 28.2

The Schism in Decision Theory

Newcomb's Problem isn't a party trick. It exposed a fault line that runs through the foundations of how we think about rational choice. On one side: causal decision theory (CDT). On the other: evidential decision theory (EDT).

CDT says: evaluate your options by their causal consequences. Your choice right now can't cause the money to appear or disappear in Box B. Omega already decided. So take both boxes — your act has no causal power over the prediction.4

EDT says: evaluate your options by what they tell you about the world. If you're the kind of person who one-boxes, that's evidence that Omega predicted you'd one-box, which is evidence Box B has a million dollars. Choosing one box is correlated with being rich. Choose one box.

This isn't a minor quibble. CDT and EDT are different theories of rationality. They disagree about what it means to make a good decision. And in Newcomb's Problem, they give opposite answers.

To feel the weight of this, consider: virtually every other puzzle in decision theory gets a consensus answer. The Monty Hall Problem confuses people at first, but once you see the math, it's settled. Expected value calculations for lotteries — settled. Even the St. Petersburg Paradox, which troubled the greatest mathematicians of the 18th century, has widely accepted resolutions. But Newcomb's Problem, more than half a century later, remains genuinely contested among professional philosophers and decision theorists. It is not solved. It may not be solvable — at least, not without giving up something we thought we knew about rationality.

Two theories of rationality, one impossible problem, zero consensus since 1969.

The Predictor Doesn't Need to Be God

Here's what makes people dismiss the problem too quickly: "Perfect predictors don't exist." True. But they don't need to be perfect. They just need to be good enough.

Think about someone who knows you extremely well — a spouse, a parent, a close friend. If they had to predict whether you'd take one box or two, they'd probably be right more than 50% of the time. In many cases, much more. You're predictable. Sorry. We all are.5

And here's the thing: as long as the predictor's accuracy exceeds a certain threshold (about 50.05%, as you can verify in the game above), one-boxing has a higher expected payoff. The dominance argument is logically valid — but it conflicts with the expected-value calculation. Something has to give.

Expected Payoff by Predictor Accuracy

One-boxing wins whenever accuracy exceeds 50.05%. Which is always, for any predictor worth calling a predictor.

· · ·

Chapter 28.3

Where Do You Stand?

Enough theory. Time to commit. Are you a one-boxer or a two-boxer? Philosophers have been polling audiences on this since Nozick first posed the problem, and the results are remarkably consistent: roughly an even split, with a slight lean toward one-boxing. A 2020 PhilPapers survey of professional philosophers found 39.0% favoring one box, 31.5% favoring two, and the rest undecided — which tells you something about how deeply this puzzle cuts.6

📊 The Great Newcomb's Poll

Choose your side. See how other readers split.

· · ·

Chapter 28.4

Newcomb's Problem Is Everywhere

You might think this is just philosophers having fun. But Newcomb-like structures show up constantly in real life — wherever your decision is correlated with someone else's prediction of it.

"My single vote won't change the outcome, so why bother?" This is two-boxing logic applied to democracy. Your vote doesn't cause millions of like-minded people to also vote — but it's correlated with them. If you stay home, you're the kind of person whose demographic stays home. If everyone in your reference class reasons the same way (and they tend to), the group outcome depends on what the individual decides. The one-boxer goes to the polls.7

Nuclear deterrence is a Newcomb problem in disguise. If the missiles are already flying, retaliating causes nothing but extra destruction — two-boxing logic says don't push the button. But the whole point of deterrence is being the kind of agent that would retaliate. You need to be a one-boxer — committed to a strategy that looks irrational once the moment arrives — precisely so that the moment never arrives.

Every time you keep a promise when breaking it would benefit you in the moment, you're one-boxing. The payoff of being known as trustworthy — being predicted to cooperate — outweighs the $1,000 you could grab right now by defecting. Two-boxers in business don't get invited back.

The connection to the Prisoner's Dilemma is not a coincidence. When you play against an opponent who thinks like you — a "twin," or just someone from your reference class — defecting is like two-boxing: it dominates in every particular case, but cooperators playing cooperators outperform defectors playing defectors. Douglas Hofstadter called this "superrationality": the recognition that your reasoning process is not unique, and that what you decide is what your rational twin also decides.8

Kavka's Toxin Puzzle

Philosopher Gregory Kavka sharpened the blade further in 1983. A billionaire offers you $1,000,000 if you intend, at midnight tonight, to drink a mildly unpleasant (but not dangerous) toxin tomorrow. You don't actually have to drink it — you just have to genuinely intend to. But here's the thing: once midnight passes and you've won the money, you have no reason to drink. And if you know you won't drink it, you can't genuinely intend to. And if you can't intend to, you don't get the money.9

Kavka's Toxin reveals the same crack as Newcomb's Problem: can a rational agent commit to a strategy that's suboptimal at the moment of execution, because the commitment itself is what generates the payoff? The two-boxer says no — rationality means optimizing at every decision point. The one-boxer says that's exactly the kind of "rationality" that leaves you poor.

Functional Decision Theory: A Way Out?

In 2017, Eliezer Yudkowsky and Nate Soares proposed functional decision theory (FDT), which tries to dissolve the Newcomb stalemate. The idea: don't ask "what should I do now?" but rather "what decision procedure should I be running?"10

FDT says: you should one-box, not because your choice causes Omega to put money in the box, and not merely because one-boxing is evidence the money is there, but because Omega's prediction and your choice are both outputs of the same underlying computation — your decision algorithm. When you choose, you're choosing on behalf of every copy and simulation of your algorithm, including the one Omega ran when making the prediction.

It's a clever move. It respects the causal structure (your choice doesn't retroactively cause the money to appear) while still recommending one-boxing (because your algorithm is what Omega was reading). Whether it fully works is still debated — but it shows that after fifty years, new ideas are still emerging from Nozick's deceptively simple puzzle.

Functional Decision Theory: your choice and Omega's prediction are both running the same code.

· · ·

Chapter 28.5

What Nozick Knew

Robert Nozick, the man who started all this, never told anyone which side he was on. "I have not," he admitted, "been able to make myself feel sure that I should take one box." He found the arguments for two-boxing compelling. He also found the arguments for one-boxing compelling. He thought anyone who didn't feel the pull of both sides wasn't thinking hard enough.

And maybe that's the real lesson. Newcomb's Problem isn't a bug in decision theory — it's a feature of the world. Some situations genuinely put our principles in conflict. The dominance principle (never leave free money on the table) and the expected value principle (maximize your average payoff) usually agree. Newcomb's Problem is the stress test that shows they don't always.

This is what Ellenberg might call a case where being "right" isn't enough. The two-boxer is right that their act can't change the contents of the box. The one-boxer is right that one-boxers get richer. Being right about the logic doesn't mean you're right about the decision, because "the decision" isn't purely a logical object — it's embedded in a world where other agents are watching you, modeling you, predicting you. Your rationality isn't a private affair. It has consequences that ripple outward, because other minds are computing what you'll do.

That's the deep lesson of Newcomb's Problem, and it's one that matters far beyond philosophy seminars. In a world increasingly full of algorithms that predict our behavior — recommendation engines, credit scoring models, adversarial AI systems — the question of whether to optimize for the act or for the kind of agent you are is not academic. It's the question.

You can pick a side. You probably already have. But if you're not at least a little uncomfortable with your choice, you might want to think again.

The boxes are waiting.

Notes & References

Robert Nozick, "Newcomb's Problem and Two Principles of Choice," in Essays in Honor of Carl G. Hempel, ed. Nicholas Rescher (Dordrecht: D. Reidel, 1969), 114–146. The problem was originally invented by physicist William Newcomb and shared with Nozick, who brought it to philosophical fame.
The dominance argument goes back to the foundations of game theory. See R. Duncan Luce and Howard Raiffa, Games and Decisions (New York: Wiley, 1957). The principle: if strategy A is at least as good as strategy B in every possible state of the world, and strictly better in at least one, choose A.
David Lewis made the most forceful philosophical case for two-boxing in "Causal Decision Theory," Australasian Journal of Philosophy 59, no. 1 (1981): 5–30. He lost no sleep over one-boxers getting rich.
Allan Gibbard and William Harper, "Counterfactuals and Two Kinds of Expected Utility," in Foundations and Applications of Decision Theory, eds. C.A. Hooker, J.J. Leach, and E.F. McClennen (Dordrecht: D. Reidel, 1978), 125–162.
Philip Tetlock's work on superforecasters shows that human prediction can be remarkably accurate with the right methods and training. See Philip Tetlock and Dan Gardner, Superforecasting: The Art and Science of Prediction (New York: Crown, 2015).
David Bourget and David Chalmers, "Philosophers on Philosophy: The 2020 PhilPapers Survey," Philosophers' Imprint 23, no. 11 (2023). The Newcomb's Problem question showed one of the more even splits in the entire survey.
The voting-as-Newcomb analogy is developed in Isaac Levi, "A Note on Newcombmania," Journal of Philosophy 79, no. 6 (1982): 337–342, and more recently in Caspar Hare, "Take the Sugar," Noûs 45, no. 4 (2011): 684–712.
Douglas Hofstadter, "Dilemmas for Superrational Thinkers, Leading Up to a Luring Lottery," Scientific American 248, no. 6 (June 1983): 24–28. Hofstadter's "superrationality" is a close cousin of one-boxing logic.
Gregory Kavka, "The Toxin Puzzle," Analysis 43, no. 1 (1983): 33–36.
Eliezer Yudkowsky and Nate Soares, "Functional Decision Theory: A New Theory of Instrumental Rationality," arXiv preprint arXiv:1710.05060 (2017).