Prediction Markets — The Missing Chapters

Chapter 1

The Bet That Beat the Pundits

On the evening of November 5, 2012, the television pundits were doing what television pundits do best: disagreeing with great confidence. On Fox News, the analyst Karl Rove was insisting that Ohio was still in play for Mitt Romney. On MSNBC, the mood was cautiously optimistic for Obama but hedged with professional uncertainty. Cable news, as always, presented an election that was too close to call.

Meanwhile, a website called Intrade — an online prediction market based in Dublin — was telling a completely different story. Shares of "Barack Obama to win the 2012 presidential election" were trading at 67 cents. In prediction market language, that's a 67% probability. And on the state-by-state level, the market had been even more specific: it correctly priced Obama as the favorite in every single swing state he eventually won.1

This was not a lucky guess. The Iowa Electronic Markets, an academic prediction market run out of the University of Iowa since 1988, had outperformed major polls in presidential elections 74% of the time.2 Not by using better polling data. Not by employing smarter analysts. Just by letting people bet.

Here's the deep question: why should putting money on the table make predictions better? And more importantly, what does this tell us about the nature of knowledge itself?

The answer reaches into the mathematical bones of how information aggregates — and it challenges some of our most fundamental assumptions about expertise, democracy, and truth.

Chapter 2

The Price of a Probability

A prediction market is beautifully simple. You create a contract that pays $1 if some event happens and $0 if it doesn't. Then you let people buy and sell that contract. If you think there's a 70% chance of rain tomorrow, you'd be willing to pay up to 70 cents for a contract that pays a dollar if it rains. If I think the chance is only 40%, I'd happily sell you that contract at 65 cents — you'd think you're getting a bargain, and I'd think I am too.

The market price that emerges from all this buying and selling is, in effect, the crowd's probability estimate. When the contract trades at 65 cents, the market is "saying" there's a 65% chance of rain.

Buyers and sellers meet to trade contracts on future events. The market price becomes a probability.

Now, this sounds almost too simple. Why should this work better than, say, just asking a bunch of smart people what they think? The answer lies in a mathematical idea that Friedrich Hayek articulated in 1945, though he didn't have the formalism for it: prices aggregate dispersed information.3

Think about it this way. Suppose there's a market on whether a company will meet its earnings target. A factory worker might notice the parking lot has been unusually full — overtime is up. A truck driver might observe that shipments have increased. An industry analyst might know about a new contract. No single person has the complete picture. But if all three are trading in the market, the price will reflect the combined force of all their private signals, even though they've never spoken to each other.

This is more than hand-waving. In the 1980s, the economist Paul Milgrom proved a remarkable theorem: under certain conditions, the price in a market with asymmetrically informed traders converges to the rational expectations equilibrium — the price that would prevail if everyone's private information were pooled and analyzed by a single omniscient being.4 The market doesn't just aggregate opinions. It aggregates information.

The Information Aggregation Theorem

Market prices in a prediction market converge to the true probability of an event — not because any individual trader knows the truth, but because the price-discovery mechanism combines all traders' private information into a single number. The market "knows" more than any of its participants.

Chapter 3

Galton's Ox and the Wisdom of Crowds

The idea that groups can outthink individuals has a wonderfully Victorian origin story. In 1906, the polymath Francis Galton visited a county fair in Plymouth, England, where roughly 800 people had each paid sixpence to guess the weight of an ox. Galton — who was, let's be honest, something of a snob when it came to the intellectual capabilities of the general public — expected the crowd's guesses to be wildly off. Instead, the median guess was 1,207 pounds. The actual weight was 1,198 pounds. The crowd had been off by less than 1%.5

This is the wisdom of crowds in its purest form, and the mathematics behind it are surprisingly clean. Suppose each person's guess is the true value plus some random error: guess_i = truth + error_i. If the errors are independent and have zero mean (people are wrong in all directions, not systematically biased), then the average of all the guesses converges to the truth by the law of large numbers. The errors cancel. The signal remains.

But here's what makes prediction markets better than a simple poll or survey: they don't just weight everyone equally. They weight people by how much they're willing to bet — which turns out to be a proxy for how much they know. The factory worker who sees the overtime schedule might bet $500. The person who vaguely feels the economy is doing well might bet $10. The market automatically gives more weight to better-informed participants.

There's a darker implication too. The wisdom of crowds depends critically on the independence of errors. When people start copying each other — when they watch the same news, follow the same pundits, panic together — the errors stop cancelling. They pile up. This is how bubbles form, and prediction markets are not immune. During the 2016 U.S. presidential election, markets priced Hillary Clinton's chances at around 80-85%, heavily influenced by the polling aggregators everyone was watching.6 The traders weren't bringing diverse private information — they were all reading FiveThirtyEight.

Chapter 4

Scoring the Forecasters

If we're going to claim that prediction markets are accurate, we need a rigorous way to measure accuracy. This is trickier than it sounds. Suppose I say there's a 70% chance of rain and it does rain. Was I right? Sort of. But if I'd said 95% and it rained, I would have been more right. And if I'd said 70% and it didn't rain, I wasn't necessarily wrong — I said there was a 30% chance of no rain, and that's what happened.

The solution is the Brier score, invented by Glenn Brier in 1950 for evaluating weather forecasters.7 It's delightfully simple:

Brier Score

B = ( p − o ) ²

Your predicted probability (0 to 1)

The outcome (1 if the event happened, 0 if not)

Lower is better. A perfect forecaster scores 0. Random guessing (always saying 50%) scores 0.25.

The beauty of the Brier score is that it's a proper scoring rule — meaning you minimize your expected score by reporting your true belief. If you genuinely think there's a 70% chance of rain, you can't improve your expected score by saying 60% or 80%. Honesty is the mathematically optimal strategy.8

This is exactly what prediction markets enforce through a different mechanism. If you think the true probability is 70% and the market says 60%, you can buy and expect to make money. If you lie — buying as if the probability is 80% — you'll overpay and lose. The market doesn't just incentivize honesty; it incentivizes precision.

Brier Score Calculator

See how your probability forecast scores against what actually happened.

Your prediction 50%

Chapter 5

The Challenger Disaster and the Market's Verdict

Perhaps the most striking example of prediction market magic happened without anyone designing a prediction market at all. On January 28, 1986, the Space Shuttle Challenger broke apart 73 seconds after launch, killing all seven crew members. Within minutes, investors began selling shares of the four major contractors involved in the shuttle program: Rockwell International, Lockheed, Martin Marietta, and Morton Thiokol.9

By the end of the trading day, Morton Thiokol's stock had fallen 12%. The other three companies lost only 2-3%. The stock market, within hours, had identified Morton Thiokol as the company most responsible for the disaster — a conclusion the Presidential Commission, led by physicist Richard Feynman, would take five months to reach. The O-ring failure was indeed a Morton Thiokol problem.

Economists Michael Maloney and Harold Mulherin later analyzed this episode in detail and found something remarkable: there was no insider trading, no leaked information, no single trader who "knew."10 The market simply aggregated the dispersed knowledge of thousands of people — engineers who knew about O-ring concerns, industry watchers who understood the contracts, analysts who could read the supply chains — and synthesized it into a verdict faster and more accurately than any investigation could.

The stock market identified the O-ring manufacturer within hours. The official investigation took five months.

"The curious task of economics is to demonstrate to men how little they really know about what they imagine they can design."
— Friedrich Hayek, The Fatal Conceit

Chapter 6

Where Markets Fail

If you've been nodding along thinking prediction markets are some sort of epistemological silver bullet, I have some bad news. They fail, and they fail in instructive ways.

The first failure mode is thin markets. The wisdom of crowds requires, well, a crowd. When only a handful of people are trading, a single wealthy contrarian can move the price far from the truth. In 2012, a single trader on Intrade — later identified as a Romney supporter — spent hundreds of thousands of dollars buying up Romney shares in the final weeks of the campaign, briefly pushing the price to suggest a near-coin-flip race that the fundamentals didn't support.11

The second failure mode is correlated information. Remember that the wisdom of crowds depends on independent errors cancelling out. When all traders are looking at the same polls, the same models, the same Twitter feeds, their errors become correlated. The market doesn't aggregate diverse information anymore — it amplifies a single source's errors. This is the mathematical phenomenon known as informational cascades, formalized by Bikhchandani, Hirshleifer, and Welch in 1992.12

The third failure mode is my favorite because it's the most mathematically subtle: the favorite-longshot bias. In prediction markets (and in horse racing, and in sports betting, and in option markets), longshots are systematically overpriced and favorites are systematically underpriced. A contract trading at 5 cents (implying a 5% chance) typically corresponds to events that happen only 2-3% of the time. A contract at 95 cents usually corresponds to events that happen 97-98% of the time.13

Why? Several theories compete. Risk-loving bettors overpay for lottery tickets. Transaction costs eat into the returns on near-sure-things. People have trouble distinguishing between "very unlikely" and "slightly less very unlikely." Whatever the cause, it means that raw prediction market prices need calibration — the same way a thermometer might read consistently two degrees high.

In 2003, the U.S. Department of Defense proposed a prediction market called the Policy Analysis Market, where traders could bet on geopolitical events — including the likelihood of terrorist attacks and assassinations. The idea was sound: if prediction markets aggregate information effectively, why not use them for national security intelligence?

The outrage was immediate. Senators Ron Wyden and Byron Dorgan called it a "terrorism futures market" and declared it "morally repugnant." The program was killed within 24 hours, and its director, John Poindexter, resigned.14

The irony is thick. Intelligence agencies already maintain internal prediction systems — they just call them "estimates" instead of "markets," and they perform worse. A DARPA study later showed that prediction markets consistently outperformed professional intelligence analysts in forecasting geopolitical events.15

Chapter 7

The Superforecasters

In 2011, the Intelligence Advanced Research Projects Activity (IARPA) launched the most ambitious forecasting tournament in history. They posed hundreds of questions about geopolitical events — "Will North Korea test a nuclear device before the end of 2013?" "Will the president of Tunisia remain in power through March 2014?" — and asked thousands of volunteers to provide probability estimates.16

The psychologist Philip Tetlock, who had spent decades documenting how badly experts predict the future, entered a team called the Good Judgment Project. His team didn't just win — they beat the intelligence community's own analysts, who had access to classified information, by 30%.17

But here's the mathematically interesting part. Within Tetlock's team, a small group — about 2% of forecasters — were spectacularly better than everyone else. Tetlock called them "superforecasters," and they shared a distinctive cognitive style: they updated their beliefs frequently, in small increments, in response to new information. They thought in probabilities rather than certainties. They were, in a sense, human prediction markets — constantly revising their internal prices.

The connection to Bayesian reasoning is explicit. A good forecaster starts with a prior probability and updates it as evidence arrives:

Bayesian Updating

P(H|E) = P(E|H) · P(H) / P(E)

Superforecasters do this intuitively — adjusting beliefs in small increments as new evidence arrives, rather than clinging to their original estimate or swinging wildly.

What Tetlock found was that the best superforecasters were not domain experts. They were people who were good at thinking about thinking. They knew how to weight evidence, resist cognitive biases, and distinguish signal from noise. A retired computer programmer in Nebraska outperformed CIA analysts with Top Secret clearances — not because he knew more, but because he processed what he knew better.

This is, at its heart, the same insight behind prediction markets. The market doesn't need any single participant to be an expert. It needs participants who are willing to put their money where their beliefs are and update when they're wrong. The invisible hand of the market does the rest.

Chapter 8

Are You Well-Calibrated?

Here's a game that will teach you more about your own forecasting abilities than any textbook. For each question below, give a probability estimate. Then we'll see how you did.

The Calibration Game

For each question, set the probability you think "Yes" is correct. After 6 questions, we'll check if you're well-calibrated.

Question 1 of 6 Brier Score: —

Your probability of "Yes" 50%

· · ·

Chapter 9

The Mathematics of Skin in the Game

There's a philosophical thread running through all of this that goes beyond the mathematics, but the mathematics illuminates it beautifully. The philosopher Nassim Nicholas Taleb has popularized the phrase "skin in the game" — the idea that people make better decisions when they bear the consequences of being wrong.18

Prediction markets are the purest formalization of this intuition. When a pundit on television says "I'm 90% sure the economy will improve," there is zero cost to being wrong. Next week, no one will remember. No one is keeping score. But when a trader in a prediction market buys at 90 cents, they stand to lose 90 cents if the event doesn't happen. That asymmetry — between costless assertion and costly commitment — is the entire difference between cheap talk and real information.

Robin Hanson, the economist who has done more than anyone to advocate for prediction markets, puts it this way: "A bet is a tax on bullshit."19 And the mathematical structure makes this precise. In a prediction market with a proper scoring rule — whether explicit like the Brier score, or implicit like the buy/sell mechanism — the expected payoff for a truthful reporter always exceeds the expected payoff for a dishonest one. Truth-telling is not just virtuous. It's profit-maximizing.

This has radical implications. It means that whenever we care about accurate forecasts — in politics, in corporate planning, in intelligence analysis, in pandemic preparedness — we should be deeply suspicious of any system that doesn't impose costs on inaccuracy. Expert panels, committee reports, and editorial predictions all suffer from the same flaw: they reward confidence and eloquence, not calibration and precision.

The mathematician in me wants to be careful here. Prediction markets are not oracles. They are subject to manipulation, thin trading, regulatory restriction, and the same cognitive biases that afflict all human endeavors. A market is only as good as the information and incentives flowing through it. But at their best, prediction markets represent something mathematically profound: a decentralized algorithm for converting private beliefs into public knowledge, one trade at a time.

From experts to polls to markets — each step adds a mechanism for weighting information by quality.

The lesson of prediction markets isn't really about markets at all. It's about the conditions under which collective intelligence emerges: diverse perspectives, independent thinking, a mechanism for weighting by quality, and consequences for being wrong. Whenever you find a system with all four ingredients, you find surprisingly good predictions. Whenever one is missing, you find overconfident experts, groupthinking committees, and pundits who are never held accountable for their errors.

So the next time someone tells you they're "absolutely certain" about the future — ask them if they'd like to bet on it. The pause before they answer will tell you everything you need to know.

Notes & References

Berg, J., Forsythe, R., Nelson, F., & Rietz, T. (2008). "Results from a Dozen Years of Election Futures Markets Research." Handbook of Experimental Economics Results, Vol. 1. Intrade's state-by-state pricing for the 2012 election matched eventual outcomes in all contested states.
Berg, J., Nelson, F., & Rietz, T. (2008). "Prediction Market Accuracy in the Long Run." International Journal of Forecasting, 24(2), 285-300. The IEM outperformed polls in 74% of 964 poll-market comparisons across five presidential elections.
Hayek, F.A. (1945). "The Use of Knowledge in Society." American Economic Review, 35(4), 519-530. The foundational argument that prices aggregate dispersed knowledge no central planner could collect.
Milgrom, P. & Stokey, N. (1982). "Information, Trade and Common Knowledge." Journal of Economic Theory, 26(1), 17-27. See also Grossman, S. (1976). "On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information."
Galton, F. (1907). "Vox Populi." Nature, 75, 450-451. Galton was so surprised by the result that he wrote it up immediately, noting "the result seems more creditable to the trustworthiness of a democratic judgment than might have been expected."
Rothschild, D. & Sethi, R. (2016). "Trading Strategies and Market Microstructure: Evidence from a Prediction Market." Journal of Prediction Markets, 10(1). Multiple markets priced Clinton at 80-90% on election eve.
Brier, G.W. (1950). "Verification of Forecasts Expressed in Terms of Probability." Monthly Weather Review, 78(1), 1-3.
Gneiting, T. & Raftery, A.E. (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation." Journal of the American Statistical Association, 102(477), 359-378.
Maloney, M.T. & Mulherin, J.H. (2003). "The Complexity of Price Discovery in an Efficient Market: The Stock Market Reaction to the Challenger Crash." Journal of Corporate Finance, 9(4), 453-479.
Ibid. Maloney and Mulherin found no evidence of insider trading or unusual pre-crash options activity. The market's verdict emerged from the decentralized actions of thousands of traders.
Rothschild, D. (2013). "Combining Forecasts for Elections: Accurate, Relevant, and Timely." International Journal of Forecasting, 31(3). The large Intrade trader moved the market by several percentage points but could not sustain the distortion.
Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). "A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades." Journal of Political Economy, 100(5), 992-1026.
Snowberg, E. & Wolfers, J. (2010). "Explaining the Favorite–Long Shot Bias: Is it Risk-Love or Misperceptions?" Journal of Political Economy, 118(4), 723-746.
Looney, R.E. (2003). "DARPA's Policy Analysis Market for Intelligence: Outside the Box or Off the Wall?" Strategic Insights, 2(9). The market was designed to trade on Middle East events and economic indicators, but media coverage focused on the most sensational possibilities.
Goldstein, D.G., McAfee, R.P., & Suri, S. (2014). "The Wisdom of Smaller, Smarter Crowds." Proceedings of the ACM Conference on Economics and Computation. See also the IARPA ACE program results.
Tetlock, P.E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown. The IARPA tournament ran from 2011-2015 across approximately 500 questions.
Ibid., Chapter 4. The Good Judgment Project beat the intelligence community benchmark by about 30% as measured by Brier scores, and this advantage persisted across years and question domains.
Taleb, N.N. (2018). Skin in the Game: Hidden Asymmetries in Daily Life. Random House.
Hanson, R. (2013). "Shall We Vote on Values, But Bet on Beliefs?" Journal of Political Philosophy, 21(2), 151-178. The "tax on bullshit" formulation appears in various talks and blog posts.