The feud between Sen. Bernie Sanders and Democratic leaders last week was set off by a breach of computer records most voters never think about, the party's voter file.
Until a deal was struck late Friday, the Sanders campaign, which briefly gained access to proprietary information from Hillary Clinton's campaign, had been told that it would be locked out of the file by the Democratic National Committee as a punishment.
The stakes were not insignificant; the voter file is at the heart of modern campaigning. The data that the Sanders team was said to have gained access to, and the data that it was temporarily not able to see, are all important.
The file starts with a state's official voter registration data: all the information provided by voters when they registered to vote, like their name, address, sex, age, party affiliation and — in a few states — race. Election administrators also record whether you cast a ballot (but not whom you voted for), and this information is attached as well. You might think this is private, but it's not. All of this is usually publicly available — sometimes free online, usually at cost.
The parties, campaigns and a handful of nonpartisan voter-file vendors obtain this data and then enrich it. They remove people who have moved or died, using change of address notifications or Social Security data, and add information like telephone numbers, personal consumer information, campaign contributions, or publicly available information on election results or an area's demographics. They even add people who are not registered to vote.
Campaigns, parties and other firms use this information to build statistical models that estimate other characteristics. For instance, they can use your name and where you live to estimate your race with solid accuracy. They can incorporate survey research to build more specific models, such as the likelihood that you're a Democrat, that you'll support a certain candidate or whether you're likely to be persuadable.
The campaigns use this information for just about everything. They use it to know who needs to be mobilized to turn out, or to make guesses about what kind of message might be most convincing to a voter who seems likely to be on the fence. They even use it for polling.
The Democratic Party has invested millions in its voter file, and it makes it available to campaigns through an online interface built and maintained by NGP VAN, a private company that helps progressive campaigns. The interface allows campaigns to upload their own information on voters they have contacted. It also allows organizers all the way down at the local level to download lists of voters and contact them based on their characteristics, including the modeled data on their likelihood to support a given candidate.
In a primary, the information gathered by the campaigns becomes more important than a lot of the information in the voter file — and that's why a potential breach by another campaign could be so important. The factors that usually predict whether individuals will vote for a Democrat or a Republican are far less powerful. It's the data gathered by the campaigns, not the often-hyped consumer information (like what magazines a person might buy), that does the most work in campaign models.
In this context, the proprietary data held by the campaigns is a big potential advantage. The Clinton campaign, for instance, knows a lot about her supporters from the 2008 and 2016 cycles — the people who contributed to her, who volunteered, who attended her events, who signed up for her emails in two races. This information is far more useful than any statistical model, and it helps her campaign build stronger models, for good measure.
The Clinton campaign has also invested millions in survey research; it does not seem the Sanders campaign has done so. Survey data is the fuel for strong modeled estimates: Without it, you're less likely to know the voter characteristics that tend to predict who will support a given candidate.
If the Sanders campaign was able to save the information necessary to know whom the Clinton team considered its strongest supporters, it would have been quite helpful: His campaign could then stay away from the voters who have supported Clinton, and it would give the campaign an even better idea of its supporters.
This is what Sanders campaign officials tried to do, and, according to Bloomberg News, they succeeded. Bloomberg reported that Sanders officials searched for and saved lists of voters who were modeled by the Clinton campaign to be among its likeliest supporters. For instance, they searched and saved a list titled "HFA Support 50-100," which would include anyone deemed by the Clinton campaign to have a greater than 50 percent chance of supporting Clinton. These support scores are informed by the powerful and proprietary data — on Clinton volunteers, past supporters, the Clinton campaign's polling — that the Sanders campaign would not otherwise be able to gain access to.
But it is not clear whether the Sanders campaign was able to retain this information. In a statement, NGP VAN said unauthorized users were not able to export, save or act on unauthorized information. Instead, the Sanders campaign saved a one-page summary, according to the company. As I interpret it, the Sanders campaign was not able to save the valuable individual-level records that could have significantly improved its targeting efforts.