Let's say a friend has the world's biggest jar and fills it with 300 million marbles, a mix of red and blue, and wants you to guess the proportion of each. How many would you need to count to be pretty sure you knew the answer? Believe it or not, only 800 or so would do the trick — if you plucked them at random from throughout the jar. That's how the math of probability works. And that is the basis behind modern presidential polling.
. It gets trickier when you're dealing with people instead of marbles, of course, and when it's the population of the United States. You still need only 800, but how do you reach them — by land line, by cellphone, text message, carrier pigeon? — and how do you know you've collected a representative sample? How likely is it that a person who is registered to vote is actually going to vote? And what of those who refuse to answer? When Pew Research called up people 15 years ago, nearly four in 10 answered. This year? Not even one in 10.
. None of this has changed the laws of probability. Rather, it has just made them harder to put into practice. And it makes modern polling both art and science. The lesson of this year's election is that polls actually worked pretty well as a group, but that it was dangerous — and the rules of probability back this up — to make too much of any one poll. As statisticians put it, it's important to separate the signal from the noise. And any one poll, no matter how well designed, could be "noisy," because that's just how statistics work. Better to look at a group of polls over time and not to make too much of any one.
. By now, you've no doubt heard of Nate Silver, at right, and his FiveThirtyEight blog at the New York Times. (His work has been featured in our Saturday "Reading File.") A former professional poker player, he is a nerd who made a name for himself by devising a statistical system to analyze professional baseball, and then by working out a system to assess political races. He was right in 2008, and he was dead on again this year. All campaign season, and against the pushback of pundits, Silver's number-crunching of Big Data said that President Barack Obama was the favorite. He didn't say Obama would win the vote by a huge amount or that Mitt Romney had no chance, just that the probability of an Obama victory was high. He took all of the polls, particularly state polls, plugged them and other data into his algorithm — a formula — and let his computer chug away. He didn't rely on his gut instinct but on pure data. This chart shows his calculations of popular vote for president throughout the campaign season. Read him at fivethirtyeight.blogs.nytimes.com. His book is The Signal and the Noise.
⁄ After both political conventions but before the first debate — the president's debacle in Denver — Silver's numbers, below, showed Obama beginning October at 51.5 percent to Romney's 47.4. After Romney's strong debate performance on Oct. 3, Silver's numbers on Oct. 12 showed the race as tight as it ever would be, with Obama at 49.9 to Romney's 49.1. Afterward, pundits kept talking about Romney's building momentum right up to and even after Hurricane Sandy. Silver's formula — in other words, the numbers — simply wasn't showing it. Romney would never again come so close.
. The numbers guys: Although Nate Silver has been dubbed king of "the Quants" — so-called because they quantify terabytes of information on the polls to arrive at their conclusions — there are other practitioners, among them Drew Linzer at Emory University (Votamatic.org) and Sam Wang, at the Princeton (University) Election Consortium (election.princeton.edu).
Different samples, different results
Here are samples of some national and Florida polls. Note that as different as they are — most obviously when they disagreed on who was ahead, including our own that oddly pointed to a big Romney lead in Florida — they all reveal similar trends: that Obama held a lead in September, that Romney's percentages improved after the first debate, and that Obama's numbers improved just before the election.
Sept. 5-11: Obama 50, Romney 43
Oct. 3-9: Obama 48, Romney 48
Oct. 11-17: Romney 52, Obama 45
Nov. 1-4: Romney 49, Obama 48
Sept. 12-16: Obama 51, Romney 43
Oct. 4-7: Romney 49, Obama 45
Oct. 24-28: Romney 47, Obama 47
Oct. 31-Nov. 3: Obama 48, Romney 45
Wall Street Journal/NBC News
Sept. 18: Obama 50, Romney 45
Oct. 2: Obama 49, Romney 46
Oct. 20: Obama 47, Romney 47
Nov. 4: Obama 48, Romney 47
Times/Bay News 9/Herald
Sept. 17-19: Obama 48, Romney 47
Oct. 8-10: Romney 51, Obama 44
Oct. 30-Nov. 1: Romney 51, Obama 45
Sept. 12: Obama 48, Romney 46
Oct. 11: Romney 51, Obama 47
Oct. 26: Romney 50, Obama 48
Sept. 13: Obama 49, Romney 44
Oct. 9: Obama 48, Romney 47
Nov. 1: Obama 49, Romney 47