In July of 2017, as Raffi Krikorian settled into his new office at the Democratic National Committee, Hillary Clinton’s words were still ringing in his ears. Just a few months before, the former secretary of state and recently defeated Democratic nominee for president had sat on stage at Recode’s technology conference and mercilessly bashed the Democratic Party’s data infrastructure.
“I get the nomination. So I’m now the nominee of the Democratic Party. I inherit nothing from the Democratic Party,” Clinton explained. “I mean it was bankrupt, it was on the verge of insolvency, its data was mediocre to poor, nonexistent, wrong.”
Clinton's withering criticism struck some in the party as blame shifting and stung the DNC data minds who had tried to get her elected, including the party’s former director of data science, who called her comments “fucking bullshit” in a since-deleted tweet. As the DNC’s new chief technology officer, it fell to Krikorian to figure out what exactly Clinton meant---and more importantly, what could be done about it.
Krikorian was a political neophyte, having recently left a job leading Uber’s self-driving-car efforts after building his career at Twitter, but he quickly realized that the data issues Clinton was referring to, while multifaceted and layered, all had one thing in common: a system called Vertica.
Since 2011, Vertica has been the Democratic Party's central repository for data—a place to store every state's voter file, every door knock and phones call organizers make, and every bit of commercially available data that campaigns collect. It played an important role in President Obama's successful bid for reelection in 2012, establishing the need for a strong data operation as central to modern-day campaigning. After just a few years, however, the system was already showing its age, and many Democrats feared that the lack of a strong data operation could handicap their candidates in 2020 and beyond.
Krikorian started hearing what he calls “war stories” about Vertica almost immediately, as he interviewed former campaign staffers like Robby Mook, Clinton’s campaign manager, and Stephanie Hannon, a former Googler and Clinton’s chief technology officer. The system was famous for crashing for 16 hours at a time. One data director in North Carolina told him she used to nap in her car just waiting for Vertica to come back online. Mook, Krikorian recalls, likened Vertica to Beirut---when the system got overloaded, as it almost always did, it would just shut down until the shelling stopped.
“It's not the system's fault it wasn't working,” Mook tells WIRED. “It wasn't built to last a long time or have the number of users it ended up having.”
For Krikorian, Vertica seemed like the main impediment to technological progress within the party. “I came in with a whole set of lofty goals of things we wanted to achieve at the party,” Krikorian says. “Once I peeled the onion, it all sort of came down to, well, we can’t do Interesting Thing X until Vertica’s fixed.”
So, in the months before the 2018 midterms, a make-or-break election for Democrats, he made the risky bet to divide his 40-person tech staff into two teams. One team would need to keep Vertica alive through Election Day; the other would be in charge of building whatever came next.
Now, Krikorian’s team is preparing to pull the plug on Vertica and stand up a new, more powerful system called, simply, the Data Warehouse. It will be backed by Google’s analytics tool called BigQuery, a cloud-based platform capable of handling massive data sets at the scale and speed necessary for an organization the size of the Democratic party.
“One of my top priorities has been to overhaul the party’s tech and data infrastructure and make sure we put the 2020 nominee and all of our candidates in the best possible position to take on the GOP and win,” DNC chair Tom Perez told WIRED in a statement. “The DNC’s Data Warehouse is the centerpiece of our tech efforts and will allow campaigns and committees to better store, access, and analyze their data.”
The shift comes as Krikorian is stepping down from his position at the DNC to move back to California with his wife and two kids. Lindsey Schuh Cortés, Krikorian’s deputy and former CEO of the Democratic data firm BlueLabs, is taking on the role of acting CTO until the party hires someone new.
Democrats are also preparing to set up a much-anticipated data exchange that will allow the party and outside political groups to share their data for the first time, without running afoul of campaign finance laws. The exchange, which will be chaired by former DNC chairman Howard Dean, is modeled on the GOP Data Trust, a similar organization on the Republican side. Some Democrats believe that organization gave President Trump a major advantage in 2016, because it opened up a world of data to his campaign, beyond what the RNC could collect on its own. But building such a complex data-trading post would have been impossible for Democrats under the Vertica system.
When the DNC first purchased the Vertica technology in preparation for President Obama’s 2012 reelection campaign, the idea that one central repository could hold all of the party’s data seemed revolutionary. Before, this valuable information was housed in disparate databases, making it difficult if not impossible for campaigns to blend it all together and get the full picture of who voters were and what they cared about most. Having all that data in one place enabled the Obama campaign to sift through it to target people with voter outreach and advertising at an individual level, instead of simply sorting them into broad categories like, say, soccer moms or Nascar dads.
But Vertica's flaws soon became apparent. For one thing, its interface proved impenetrable to political newcomers and smaller campaigns with limited data analytics experience. “It was just columns of tables, with all these numbers, and maybe the column was named, ‘This is the right one 2014 Booker,’” Schuh Cortés says. “You had to know or be on a prior campaign to understand what boxes were useful and where the really good data was.”
Vertica also predated many of the cloud-based systems that exist today. Instead, it required the DNC to set up servers, which were never really meant to withstand the stress of terabytes of data flowing into them or thousands of data analysts trying to access data in the final days before an election. “If you were to tell me that same tech tool was going to be used in 2018, I would have called you crazy,” says Josh Hendler, who served as the DNC's CTO between 2009 and 2011.
After 2012, a lack of maintenance and an overabundance of data caused Vertica to fall into a state of disrepair, even as demand for more data-driven campaigning grew. Heading into the 2016 election, Clinton’s team, which included top engineering talent from Silicon Valley, struggled with the system they inherited. “It was a shit show from the moment I started there,” says Gerard Niemira, who was the campaign's director of product. He remembers Vertica crashing for 72 hours on his first day, after one new analyst wrote a simple query.
According to Schuh Cortés, whose BlueLabs colleague Elan Kriegel was Clinton’s head of data analytics, the Clinton team had to have dozens of engineers on call at all hours of the day and night, ready to restart the system each time it crashed.
“What their campaign was reacting to was a failure over the course of the Obama years to effectively keep up the pace of modernization inside the DNC,” says Michael Slaby, who worked as chief innovation officer for President Obama’s 2012 campaign and whose former company, The Groundwork, worked with the Clinton campaign in 2016. “Technology doesn’t sit still for 10 years.”
When Krikorian joined the DNC in 2017, he saw firsthand just how many resources the party was wasting trying to keep the system up and running, even in an off cycle. In 2018, during a high-intensity midterm, it took constant care from the DNC's engineers to keep Vertica alive, and it still managed to go down for one 10-hour stretch overnight. After Democrats took back the House on November 6, the plan to replace it went into full effect. Schuh Cortés says, “It was all hands on deck.”
One major goal was finding a more stable platform, one that didn't require Democrats to maintain their own servers. The DNC sought out new tools from a variety of companies, and it settled on Google’s BigQuery because, as Jennifer Kane, the DNC’s product management lead, says, “We don’t have to wake up at 3 am to take care of Google’s BigQuery. They’ve got that covered.”
The party raised $5 million from donors explicitly for this project, and the team spent the next several months setting up the new Data Warehouse, which they hope will be more reliable, and also more intuitive for smaller campaigns. In February the DNC began onboarding its first presidential campaign and recently held training sessions for Democratic data staffers from across the country.
“We get to hand the keys to a data Lamborghini to state legislative campaigns, and that never would have been possible before,” Schuh Cortés says. Vertica, she adds, will be officially laid to rest this summer.
The new warehouse is foundational to achieving one of the party's primary goals before 2020: the creation of a data exchange.
The Federal Election Commission prohibits coordination between campaigns and outside groups, which has traditionally meant that the candidate's campaign and its super PAC couldn't compare or intermingle the data each was collecting. But the GOP found a way around that rule, by creating a third-party organization called the Data Trust in 2011. This company sits outside the party and acts as a sort of data clearinghouse. A variety of Republican groups license their data to the Trust, which enables other groups to purchase access to it without violating FEC regulations.
After the 2016 election, Schuh Cortés says, Democrats began viewing this model as a competitive advantage for Republicans. “When you only have access to half of the picture of all of the voter file contact happening, you make decisions based on what you know,” she says. “Having the data exchange in place opens up the whole other half of all the investments being made in voter contact.”
Niemira hopes that will help with, as Clinton put it, the “mediocre to poor, nonexistent, wrong” quality of the DNC's data. “Yes, the data’s bad. If you've talked to anybody who’s phones banked for any kind of Democratic candidate at any point in time, you will see that,” he says, noting that Democrats have a habit of knocking on dead people's doors. “That kind of issue will be solved by the exchange, because it's about better information moving back and forth between sides.”
But state party officials, who manage their states' voter files, were initially reluctant to give up control of the party's most precious resource. In the end, the party struck a compromise: The data itself would be housed within the DNC. The data exchange would merely track who's giving and taking what information and build the pipes that connect the data sets. The creation of a new Data Warehouse, Schuh Cortés says, means those pipes have something stable to plug into. “That inflow of data would have crashed [Vertica] on day one,” she says.
In February, the DNC announced that former Vermont governor Howard Dean would chair the exchange's board, with former Obama aide Jen O'Malley Dillon running the company day-to-day. “The agreement to exchange data will mean that one door knock in Iowa can benefit the entire ecosystem,” Dean told WIRED in a statement.
Just last week, however, Dillon announced she was taking on the role of campaign manager for Texas congressman Beto O'Rourke's presidential bid. Now, questions remain about what progress is being made on the exchange and, most importantly, who will run it going forward. Dillon didn't respond to WIRED's requests for comment.
There’s little question among Democratic operatives that the party's fundamental data infrastructure was in desperate need of an upgrade. But having a data repository that's merely functional is hardly enough. For one thing, there's a growing awareness that Democrats also need to rethink their approach to digital advertising.
In 2016 the Trump campaign drastically outspent the Clinton campaign on platforms like Facebook. Trump’s then-digital director and current campaign manager Brad Parscale has argued that Facebook was the reason Trump won. Now, according to recently compiled numbers on presidential digital advertising, history is threatening to repeat itself. “They’re spending more money than the Democratic field combined, which should scare everybody on the left,” Slaby says of the Trump campaign. Not only that, the Trump campaign has a substantial head start, having appointed Parscale as campaign manager just one year after President Trump took office.
Krikorian uses a dashboard to track those same advertising numbers. “Everyone else is being dwarfed by the amount of spending he’s been doing,” he says. “That definitely has me worried.”
Niemira also has concerns about whether the new data warehouse will be accessible enough to campaign staffers who don't know how to code, given that it still requires at least some programming skills. That could be a significant hurdle, he says, for the many local data directors who are essentially efficient field staffers who got a promotion. Given how limited technical capacity is within the party, Niemira believes it's critical for Democrats to build tools that the average field staffer can access easily. (His company, Acronym, is building one such tool.) “In order to be successful in 2020 and beyond, we have to figure out a way to get low-skill users to be able to pull this data around the ecosystem,” he says.
Then there’s the fact that the party still isn’t collecting much digital data in any sort of standard way. To figure out whose door to knock on or who to target with an ad, Democrats have relied for years on the voter file, as well as information on past voter contacts that flows from a field organizing tool called VoteBuilder. But the past few years have seen a wave of new digital products emerge for Democrats and a cloud of what Mook calls “digital exhaust” spewing forth from social media platforms. It's not clear whether either party fully knows what to do with it yet.
“Both parties are wrestling right now with figuring out how to take the enormous amount of data that exists in the ecosystem, what people are posting on social media and other social, digital breadcrumbs … and use it to get better insights about who might support which candidates,” Mook says. “The problem for the Democratic Party right now is that until we get this basic infrastructure completed, we’re not even in a position to advance in that competition.”
Krikorian has spent the past year and a half trying to get that infrastructure in place---and working to make sure that the 2020 Democratic nominee doesn't have the same gripes about the party's data that Clinton had. He's recruited a diverse team of technologists from the private sector, and he says the candidates being interviewed to replace him come from “really large tech companies.” He admits that whoever that person is will have their work cut out for them, but at least now, he says, “I really feel like it’s actually all possible.”
Updated 4-2-19, 9:03 am EDT: This story has been updated to correct Jennifer Kane's job title. She is the DNC’s product management lead, not project management lead.
- How much prenatal genetic info do you really want?
- On the trail of the robocall king
- The real choice you make subscribing to Apple services
- The mathematical history of a perfect color combination
- For gig workers, client interactions can get … weird
- 👀 Looking for the latest gadgets? Check out our latest buying guides and best deals all year round
- 📩 Get even more of our inside scoops with our weekly Backchannel newsletter