Column: How data failed us in calling an election

Published Nov. 11, 2016

Tuesday was a rough night for number crunchers. And for the faith that people in every field — business, politics, sports and academia — have increasingly placed in the power of data.

Donald Trump's victory ran counter to almost every major forecast — undercutting the belief that analyzing reams of data can accurately predict events. Voters demonstrated how much predictive analytics, and election forecasting in particular, remains a young science: Some people may have been misled into thinking Hillary Clinton's win was assured because some of the forecasts lacked context explaining potentially wide margins of error.

"It's the overselling of precision," said Dr. Pradeep Mutalik, a research scientist at the Yale Center for Medical Informatics, who had calculated that some of the vote models could be off by 15 percent to 20 percent.

Virtually all the major vote forecasters, including Nate Silver's FiveThirtyEight site, the New York Times' Upshot and the Princeton Election Consortium, put Clinton's chances of winning in the 70 percent to 99 percent range. The election prediction business is one small aspect of a far-reaching change across industries that have increasingly become obsessed with data, the value of it and the potential to mine it for cost-saving and profitmaking insights. It is a behind-the-scenes technology that quietly drives everything from the ads that people see online to billion-dollar acquisition deals.

Examples stretch from Silicon Valley to the industrial heartland. But data science is a technology advance with tradeoffs. It can see things as never before, but also can be a blunt instrument, missing context and nuance. All kinds of companies and institutions use data quietly and behind the scenes to make predictions about human behavior. But only occasionally — as with Tuesday's election results — do consumers get a glimpse of how these formulas work and the extent to which they can go wrong.

The failed election predictions suggest that the rush to exploit data may have outstripped the ability to recognize its limits.

"State polls were off in a way that has not been seen in previous presidential election years," said Sam Wang, a neuroscience professor at Princeton University who is a co-founder of the Princeton Election Consortium.

He speculated that polls may have failed to capture Republican loyalists who initially vowed not to vote for Trump, but changed their minds in the voting booth.

Beyond election night, there are broader lessons that raise questions about the rush to embrace data-driven decisionmaking across the economy and society. Big-data decisionmaking is increasingly being embraced in every industry, and to make higher-stakes decisions that crucially affect people's lives — like helping to make medical diagnoses, hiring choices and loan approvals.

Spend your days with Hayes

Spend your days with Hayes

Subscribe to our free Stephinitely newsletter

Columnist Stephanie Hayes will share thoughts, feelings and funny business with you every Monday.

You’re all signed up!

Want more of our free, weekly newsletters in your inbox? Let’s get started.

Explore all your options

The danger, data experts say, lies in trusting the data analysis too much without grasping its limitations and the potentially flawed assumptions of the people who build predictive models.

The technology can be, and is, enormously useful. "But the key thing to understand is that data science is a tool that is not necessarily going to give you answers, but probabilities," said Erik Brynjolfsson, a professor at the Sloan School of Management at the Massachusetts Institute of Technology.

Brynjolfsson said that people often do not understand that if the chance that something will happen is 70 percent, that means there is a 30 percent chance it will not occur. The election performance, he said, is "not really a shock to data science and statistics. It's how it works."

So, what happened with the election data and algorithms? The answer, it seems, is a combination of the shortcomings of polling, analysis and interpretation, perhaps both in how the numbers were presented and how they were understood by the public.

In addition to the polling errors, data scientists said the inherent weakness of election models might have caused some forecasting errors.

Before an election, forecasters use a combination of historical polls and recent polling data to predict a candidate's chance of winning. Some may also factor in other variables, such as giving higher weight to a candidate who is an incumbent.

But even with decades of polls to analyze, it is difficult for forecasters to predict accurately a candidate's chance of winning the presidency months or even weeks ahead of time. Mutalik of Yale compared election modeling to weather forecasting.

"Even with the best models, it is difficult to predict the weather more than 10 days out because there are so many small changes that can cause big changes," Mutalik said. "In mathematics, this is known as chaos."

© 2016 New York Times