All Chapters

The Missing Chapter

The Order of Things

What your messy bookshelf reveals about the deep structure of computation

An extension of Jordan Ellenberg's "How Not to Be Wrong"

Chapter 60

The Pile on Your Desk

You've just collected 150 exams from your Intro to Statistics class. They need to be in alphabetical order by last name before you can enter the grades. You pick up the first exam — Williams — and set it down. You pick up the next — Chen — and put it in front of Williams. The next is Nakamura, which goes between them. You keep going, each time sliding the new exam into its rightful place among the ones you've already organized.

Congratulations. You've just independently invented one of the oldest algorithms in computer science: insertion sort. And if you've ever sorted a hand of playing cards, you've done the exact same thing. Pick up a card, find where it goes among the cards you're already holding, slide it in.1

Here's the thing, though. Insertion sort works. It's intuitive. It's what humans naturally do. But if I gave you ten thousand exams instead of a hundred and fifty, you'd be there until the heat death of the universe. Well — not quite. But you'd be very, very unhappy. Because insertion sort is slow, and the story of exactly how slow it is, and why, and what you can do about it, turns out to be one of the most beautiful stories in mathematics.

It's a story about the deep question: what is the least amount of work you must do to put things in order? And it has a precise, surprising answer.

· · ·
Chapter 60.1

The Slowness of the Obvious

Let's start with the simplest possible sorting strategy — one so simple it barely deserves to be called a strategy at all.

Bubble sort works like this: walk through your list, comparing each pair of neighbors. If they're out of order, swap them. When you reach the end, go back to the beginning and do it again. Keep going until you make a complete pass with no swaps. You're done.

Pass 1: 5 3 swap! 8 1 4 After: 3 5 1 4 8 ✓ in place Largest element "bubbles" to the end each pass
Each pass of bubble sort pushes the largest unsorted element to its final position — like bubbles rising to the surface.

It's called bubble sort because the larger elements gradually "bubble up" to the end of the list, like air bubbles rising through water. Charming name. Terrible algorithm.

Why terrible? Think about the worst case. Suppose your list is completely backwards — [5, 4, 3, 2, 1]. The element 1 is all the way at the end, but it needs to be at the beginning. Each pass moves it only one position to the left. So you need n passes, and each pass looks at up to n pairs. That's roughly n × n = n² operations.

Bubble Sort Complexity
T(n) = O(n2)

In the worst case, every element must cross the entire array.

Computer scientists write this as O(n²) — "order n squared" — which is a way of saying that if you double the size of the list, the work roughly quadruples. Sort 100 exams and it takes, say, 10,000 comparisons. Sort 200 exams and it takes 40,000. Sort 10,000 and you're looking at 100 million. This is the tyranny of quadratic growth, and it should be familiar to anyone who's watched a small-town traffic problem become a nightmare when the population doubles.2

But here's the question that makes this mathematics and not just bookkeeping: can we do better?

· · ·
Chapter 60.2

Divide and Conquer

The breakthrough idea is one of the oldest tricks in the strategist's handbook: divide and conquer.

Imagine you're faced with that pile of 150 exams again. Instead of laboriously inserting them one by one, you split the pile in half — roughly A through M in one pile, N through Z in the other. Now sort each half separately. Then merge the two sorted halves by interleaving them, always picking whichever name comes first alphabetically.

This is merge sort, invented by John von Neumann in 1945 — one of the first algorithms ever written for a stored-program computer.3 And it's dramatically faster than bubble sort.

Step 1: If your list has one element, it's already sorted. Done.
Step 2: Split the list in half.
Step 3: Recursively sort each half.
Step 4: Merge the two sorted halves into one sorted list.

The merging step is the key insight: combining two sorted lists into one sorted list takes only n comparisons. You just walk along both lists simultaneously, always picking the smaller element.

Why is this faster? Think about it recursively. You have n items. You split into two groups of n/2 and sort each, then merge in n steps. Each of those halves splits into groups of n/4, and so on. You keep halving until you reach single elements — and that takes log₂(n) levels of splitting. Each level does n total work for the merging. So the total work is n × log₂(n).

Merge Sort Complexity
T(n) = O(n log n)

log₂(150) ≈ 7, so merge sort does roughly 150 × 7 ≈ 1,050 comparisons. Bubble sort: up to 150² = 22,500.

The difference between n² and n log n doesn't just matter — it's the difference between possible and impossible. For a million items, n² is a trillion operations. n log n is about 20 million. That's the difference between a computation that takes weeks and one that finishes before you've lifted your finger off the Enter key.

Array size (n) Operations O(n²) O(n log n) O(n) The gap grows fast
The gap between O(n²) and O(n log n) is the difference between "takes a while" and "takes forever."
· · ·
Chapter 60.3

The Gambler's Algorithm

In 1960, Tony Hoare — a young British computer scientist working on machine translation at Moscow State University, of all places — came up with an algorithm that, on paper, looks worse than merge sort. But in practice, it's faster. Much faster.4

The algorithm is quicksort, and it works like this. Pick an element — any element — and call it the pivot. Now partition the array: everything less than the pivot goes to the left, everything greater goes to the right. Then recursively sort the left and right partitions.

That's it. No merging step. Just partition and recurse.

On average, quicksort runs in O(n log n) time — the same as merge sort. But here's the catch: in the worst case, it degrades to O(n²). If you're unlucky with your pivot choices — say, you always pick the smallest element — you end up with one partition of size 0 and one of size n − 1, and you've basically reinvented selection sort.

So why would anyone prefer an algorithm with a worse worst case? Because computers are not made of math. They're made of silicon and copper and cache memory. Quicksort accesses memory in a sequential, cache-friendly pattern. Merge sort, with its constant splitting and merging into separate arrays, thrashes the cache. In practice, quicksort's lower constant factor wins by a mile. It's the algorithm your computer actually uses when you call .sort().5

"The best algorithm isn't the one with the prettiest theorem. It's the one that finishes first on real hardware."

There's a beautiful fix for quicksort's worst case, by the way: randomize your pivot. If you pick the pivot randomly, the probability of consistently picking terrible pivots drops exponentially. The expected running time is O(n log n), and the probability of significantly worse performance is vanishingly small. Randomness, it turns out, is an algorithm designer's best friend — an idea we explored in the Kelly criterion chapter.

🔬 Sorting Visualizer
Watch how different sorting algorithms rearrange an array. Each bar's height represents its value. Highlighted bars are being compared or swapped.
Comparisons: 0  |  Swaps: 0
· · ·
Chapter 60.4

The Information-Theoretic Wall

Here's a question that sounds philosophical but is actually mathematical: what is the absolute minimum number of comparisons needed to sort a list?

Not "what's the fastest algorithm we've found." What's the fastest algorithm that could possibly exist?

The answer comes from information theory. Think of it this way: before you start sorting, the n elements could be in any of n! possible orderings. (n! — n factorial — is the number of permutations.) After sorting, there's only one correct ordering. Each comparison you make is a yes-or-no question that, at best, cuts the remaining possibilities in half. So you need at least log₂(n!) comparisons to narrow down from n! possibilities to one.6

Comparison Sort Lower Bound
comparisons log2(n!) n log2 n

By Stirling's approximation: log₂(n!) ≈ n log₂ n − n log₂ e. The dominant term is n log n.

This is one of those results that feels like it shouldn't work — like you're pulling a rabbit out of a hat made of pure logic. We didn't analyze any particular algorithm. We just counted possibilities. And yet we proved, irrefutably, that no comparison-based sorting algorithm can ever beat n log n. Merge sort and quicksort aren't just good — they're optimal, up to constant factors. They're pressing against a wall built by mathematics itself.

The Deep Insight

The lower bound proof is really about information. Sorting isn't about moving data — it's about acquiring information about the relative order of elements. Each comparison gives you one bit of information. You need log₂(n!) bits to determine the permutation. That's the wall.

But wait — I said "comparison-based." What if you cheat?

· · ·
Chapter 60.5

Cheating the Wall

Radix sort breaks through the n log n barrier by refusing to play the comparison game. Instead of asking "is A less than B?", it looks at the digits of each number directly.

Sort all the numbers by their last digit. Then, keeping that order stable, sort by the second-to-last digit. Then the third. After processing all k digits, the list is sorted. Total work: O(n × k). If your numbers have a fixed number of digits, that's just O(n) — linear time.7

This feels like a free lunch, but it's not. Radix sort doesn't violate the lower bound because the lower bound applies only to comparison-based sorts. Radix sort uses a different kind of information — the structure of the keys themselves. It's like the difference between finding a book in a library by comparing titles (slow) versus walking to the right shelf because you know the Dewey Decimal number (fast). You're not doing less work; you're exploiting extra structure.

And then there's the other end of the spectrum.

Bogosort: shuffle the array randomly. Check if it's sorted. If not, shuffle again. Expected running time: O(n × n!). For 10 elements, that's roughly 36 million shuffles. For 20 elements, you're looking at more shuffles than there are atoms in the observable universe. Nobody actually uses bogosort, of course — it exists as a joke, a cautionary tale, and a surprisingly useful pedagogical tool. It makes the O(n²) of bubble sort look positively blazing by comparison.8

Algorithm Best Average Worst Trick
Bubble sortO(n)O(n²)O(n²)None — just stubbornness
Insertion sortO(n)O(n²)O(n²)Great on nearly-sorted data
Merge sortO(n log n)O(n log n)O(n log n)Divide and conquer
QuicksortO(n log n)O(n log n)O(n²)Random pivots save the day
Radix sortO(nk)O(nk)O(nk)Don't compare — use digits
BogosortO(n)O(n·n!)Hope
· · ·
Chapter 60.6

The Race

Numbers on a page can only tell you so much. To really feel the difference between O(n²) and O(n log n), you need to watch them race.

The interactive below generates a random array and sorts it with four algorithms simultaneously. Try it with 20 elements — the algorithms finish so close together you can barely tell them apart. Now try 200. The gap is a chasm. At 500, the quadratic algorithms are still grinding away while merge sort and quicksort finished ages ago.

🏁 Algorithm Race
Race sorting algorithms head-to-head. Increase the array size to see O(n²) vs O(n log n) diverge dramatically.
Bubble Sort0 ops
Insertion Sort0 ops
Merge Sort0 ops
Quicksort0 ops
· · ·
Chapter 60.7

What Your Computer Actually Does

In 2002, Tim Peters — a prolific contributor to the Python programming language — designed a sorting algorithm specifically for the kind of data that humans actually have. Not random arrays (those are a theoretician's fantasy) but data with runs: sequences that are already partly sorted. Log files where entries are mostly in chronological order. Spreadsheets where someone already sorted by one column.

Timsort is a hybrid of merge sort and insertion sort. It scans the array for existing runs of sorted (or reverse-sorted) data, extends short runs to a minimum length using insertion sort, then merges runs together using a clever merge strategy. On random data, it's O(n log n). On already-sorted data, it's O(n). On nearly-sorted data — which is what most real-world data looks like — it's closer to O(n) than O(n log n).

Timsort is the default sort in Python, Java, Android, Swift, and Rust. When you sort a list in almost any modern language, you're using Tim Peters' algorithm. It's a beautiful reminder that the best solutions come from understanding not just the theory but the territory — the real, messy, partially-organized data that people actually care about sorting.

The Lesson

Mathematics tells us we can't do better than O(n log n) for comparison-based sorting. But it doesn't say we can't be clever about the constant factors, about cache behavior, about exploiting the structure that real data already has. The theoretically optimal and the practically optimal are not always the same thing — and the gap between them is where engineering lives.

· · ·
Chapter 60.8

Order Is Expensive

Here's what sorting algorithms really teach us, if we're willing to listen.

Putting things in order is hard. Not hard like calculus is hard — hard like a fundamental law of the universe. The information-theoretic lower bound isn't just a fact about algorithms. It's a fact about information itself. To determine the correct ordering of n items, you must gather at least n log n bits of information. There is no shortcut. There is no trick. There is only the wall.

And yet — within those constraints — there is enormous room for cleverness. Merge sort and quicksort both hit the O(n log n) bound, but they do it in completely different ways, with completely different practical characteristics. Radix sort sidesteps the bound entirely by exploiting structure. Timsort hugs the bound while being exquisitely adapted to the data it actually encounters.

The next time you sort your bookshelf, or alphabetize a stack of papers, or arrange your spice rack, think about this: you are performing a computation. Your brain is an algorithm. And the algorithm you instinctively use — insertion sort — is one that computer scientists spent decades trying to improve upon. Not because it's wrong, but because at scale, the difference between n² and n log n is the difference between a problem you can solve and a problem that solves you.

DECISION TREE FOR SORTING [a, b, c] a<b? yes no b<c? a<c? a,b,c a,c,b b,a,c b,c,a 3! = 6 leaves ≥ log₂6 ≈ 3 comparisons
Every comparison-based sort traces a path through a binary decision tree. The tree must have at least n! leaves — one for each possible ordering — so its height (the number of comparisons) must be at least log₂(n!).

Order is expensive. But knowing how expensive — and knowing, precisely, why — that's the kind of knowledge that turns a good programmer into a great one, and a casual thinker into a mathematical one.

Notes & References

  1. The naturalness of insertion sort for humans is discussed in Donald Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching (Addison-Wesley, 1973), Section 5.2.1.
  2. For a gentle introduction to asymptotic notation, see Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, Introduction to Algorithms, 4th ed. (MIT Press, 2022), Chapter 3.
  3. Von Neumann described merge sort in his 1945 report on the EDVAC computer. See Knuth, TAOCP Vol. 3, Section 5.2.4, for the historical account.
  4. C. A. R. Hoare, "Quicksort," The Computer Journal 5, no. 1 (1962): 10–16. Hoare developed the algorithm in 1959–1960 while working on a machine translation project in Moscow.
  5. Modern language runtimes use variants of quicksort (e.g., dual-pivot quicksort in Java's Arrays.sort() for primitives) or Timsort (for objects). See Jon Bentley and M. Douglas McIlroy, "Engineering a Sort Function," Software: Practice and Experience 23, no. 11 (1993): 1249–1265.
  6. The information-theoretic lower bound for comparison sorting is proven via decision trees. See Cormen et al., Introduction to Algorithms, Chapter 8.1.
  7. Radix sort dates back to Herman Hollerith's tabulating machines used in the 1890 U.S. Census. See Knuth, TAOCP Vol. 3, Section 5.2.5.
  8. Bogosort (also called "stupid sort" or "monkey sort") is attributed to various sources. Its analysis as a randomized algorithm is straightforward: the expected number of shuffles is n!, giving O(n · n!) expected comparisons. See H. Gruber, M. Holzer, and O. Ruepp, "Sorting the Slow Way," Fun with Algorithms, Lecture Notes in Computer Science 4475 (2007): 183–197.