Can the same data used by brands to discover new customers and drive growth also help political candidates discover new supporters and drive voter turnout? In our series Dstillery Predicts, we intend to find out.
In our first episode, Peter Lenz, Senior Geospatial Analyst, and Peter Ibarra, Senior Analyst, Data Science and Analytics team boldly predicted the outcome of the PA-18 Special Election that happened on Tuesday, March 13th. In episode two of Dstillery Predicts, Caroline Allen, Marketing Manager and Peter Ibarra, are joined by John Pham, the Analyst who built the voter enthusiasm model. How was the machine able to make such an accurate prediction? What role did messaging play in the Democratic victory? Listen to our podcast to find out!
CAROLINE ALLEN: Welcome to the second podcast of our Dstillery Predicts series. I’m Caroline Allen, Marketing Manager here at Dstillery, a predictive marketing intelligence company. Each episode, I’ll be bringing you direct access to our team of AI, Machine Learning and media experts. In the last episode we talked with Peter Lenz and Peter Ibarra from our Data Science and Analytics team about the PA-18 Special Election and whether or not they could accurately predict the outcome using the data that brands use to discover new customers and drive growth. They boldly predicted that Democratic candidate, Conor Lamb, would win by a very narrow margin. If you’ve watched any news channel or signed into Facebook or Twitter in the past two days, you know that they were right. Today’s Senior Data Science Analyst, Peter Ibarra, is joined by John Pham, the Analyst who actually built the enthusiasm model that led to their accurate prediction and he’s going to break down exactly how he did it. First of all, congratulations guys. It was an amazing prediction that you guys made and you were right! So how’s it feel?
JOHN PHAM: Relieved, mostly, but a lot of excitement as well for the opportunities moving forward.
CAROLINE: You actually built the model? So can you explain to our audience how that idea even comes about?
JOHN: The model that we built to predict the election is very similar to the process that we use to create the crafted audiences that are based on consumer’s behavior. The process starts with us taking seed websites that are indicative of a specific behavior. So let’s say you are a new car buyer, we would take seed websites that include car websites, car comparison websites, dealerships and that’s what we use to create a new car buying audience. But for this election, we chose websites that are indicative of being a Democrat or a Republican. So let’s say those could be campaign websites, candidate websites, donor websites, and that’s what we used to initialize creating our models. After that, we need training data to create the model and to do that, we take devices that we see every day. We filter them through the set of seed websites, and we label them if you visited a Democratic website, we label you as a Democrat and if you visited one of the Republican seed websites, we label you as a Republican. After we get all of this training data, hundreds of thousands and millions of them sometimes, we can start the process of actually building the model. So the machine will learn these patterns of behavior for each of the labels that we say, so Democrats, Republicans, and it will come to the final output which is the model. Once we have the final model we will be able to predict on new data that we haven’t seen before. At Dstillery, we’re able to geolocate these devices back to where we presume residents will be. We score them against the model and the final output will give us the probability if someone is a Republican or Democrat from zero to a hundred, zero being more Republican and one hundred being more Democratic. After we do that for all of the devices in PA-18, we see a distribution of scores and once we get that distribution, we are able to predict the final outcome.
CAROLINE: Pete, you actually were able to then take this data and use it to predict. Can you give us some background on that process?
PETER: Yeah. Once John built out all the stuff that he was talking about and looked at each of the devices we’re able to break out the level of support for each party into quintiles. We start really understanding the layers of potential supporters that can come out and vote. Like who are the ones that are the most likely to come out for the Republicans or the ones that are the most likely to come out for Dems. But, then we can start scaling that down. As you get closer towards that middle of people that don’t have a preference of one way or another, we can start seeing who are the slight Republicans and who are the slight Dems. People that we felt would make the difference in this election and start understanding those voters to know the types of issues that they would care about and whether or not … kinda doing our own research I think on the side as well as … were the candidates speaking to the issues that these people were caring about through their digital behavior. Kind of measuring that and understanding if, did they, are they going to do enough to motivate these people to come out and vote. Really want this comes down to is the Dems and the Republican hard core supporters are going to come out and vote, right? It’s about how many people can you get out in those lower levels and how many people can you motivate to come out and cast that, as we saw in this election, all important vote. It was looking at the likely supporters on each side and understanding how well each candidate was speaking to them and whether or not we felt it was going to be enough to get them out.
CAROLINE: In our last conversation, we talked about how if you were wrong, it wouldn’t necessarily be a bad thing because then you would have more information to feed to these algorithms and become more accurate the next time. But, as we know, you were correct. What is the next step? Are there more steps you can take to make the models smarter? Or, is there a different approach that you’re thinking of looking at when looking to the next election?
PETER: I think, and you know John can put in his opinion here, too. I think what we want to start really understanding is … We feel pretty good about what this enthusiasm model was showing. But how can we start layering on a voter turnout model? How can we start building on how likely people are to come out and actually make the vote like we were predicting. That’s kind of the next step of what we want to do and that could be distributing online surveys of how likely are you to vote? Start layering on that for each, just like we did here, but for each of the levels of voters. How likely are the hardcore supporters compared to the slight supporters compared to the people in the middle. That will allow us to take this enthusiasm model and really become smart about what percent in each quintile is going to come out and cast that vote. Start basically, I think this is our goal, is to say we want…we think that…from this candidate to when he’s going to need to hit this benchmark in this quintile. Hopefully, our voter turnout model will help get us to the point where we can say, “We think he’s going to be able to hit that benchmark” or “We think he’s not”. He or she, excuse me.
JOHN: I think what was exciting for me personally is, you know, we correctly predicted the election but we were able to understand what issues actually mattered. I think that’s the really exciting part. Because we had accurate insights about guns being a concern, healthcare being a concern, and now we’re trying to see if from these learnings and from understanding this quintile, actually cared about these issues. I think we’re gonna be more informed in terms of providing more insight into not only who’s going win, why are they going to win. I think that’s really exciting for me, the qualitative side to this as well.
PETER: In addition to that, those two issues identified, the fact that we were kind of seeing early on, as of a week before the election, that the tax reform stuff was not having the impact that the GOP felt like it would. There’s been articles that have come out talking about how they’re not seeing that as well. But, it was really exciting to see that our data was showing that earlier than it started to come out in the news. Not only identifying the stuff that’s going to be important, but also identifying the issues that aren’t going to matter as much either.
CAROLINE: Something they’re talking a lot about on the news right now is the importance of tailoring your message to your constituents. One thing they saw with the Conor Lamb/Rick Saccone race was that Conor Lamb really abandoned the national Democratic message. He really made sure that messages he was providing were going to hit home with his constituents who, as we mentioned in the last episode, overwhelmingly voted for Trump and were on the conservative spectrum. That is another takeaway from this data is that, not only can it help drive turnout on election day, but it can really help candidates in specific areas personalize their message and make sure that they are hitting their constituents with the messages that are most important to them. Like we do, once again, for brands every day.
PETER: I mean, all politics is local. I think Conor Lamb really showed that, that he was a bit of a … he’s that conservative Democratic candidate in an area that tends to lean conservative. The fact that he was able to get his constituency to come out in the areas that he needed to, in the 70% range, where Saccone only got it in the 50-55% range. That really shows that Lamb was able to get to those motivations that drove that enthusiasm, where Saccone really just relied on his pro-Trump stance and as we saw it just wasn’t enough. Being able to understand that on a localized level and using our data to help inform, that can make a big difference.
CAROLINE: We look forward to you guys coming back in hopefully a few weeks. I know that politics is not the only thing that’s predictive and there’s a lot of other projects that you guys are working on that you probably can’t mention right now. But…
PETER: Top Secret, that’s right. Classified.
CAROLINE: Right, exactly, all of the classified projects working on here at Dstillery. But, as you guys are coming across more findings we want you guys to come back and to talk about your next predictions. Whether it’s from the travel industry or sports, or whatever it is.
PETER: Yeah, definitely.
CAROLINE: If you guys want more information on the work that we’re doing here at Dstillery, feel free to check our website Dstillery.com. We have a full blog. You’ll find all of our political articles but also a lot of articles that can really help marketers break down this whole idea of AI machine learning and what it means in 2018. You can also follow us on Facebook, Facebook.com/Dstillery.intelligence, or on Twitter and Instagram @Dstillery. Don’t forget, that’s dstillery without the I, d-s-t-I-l-l-e-r-y. Thanks guys. See you next time.
Don’t close your browser just yet. We couldn’t end our podcast on the 2018 Election and our prediction of PA-18 without addressing one of the trending topics from this week in the news: privacy. At Dstillery we take consumer privacy very seriously. As a data science company, we are interested in patterns of web browser IDs and mobile application IDs in order to understand consumer behaviors. We don’t capture personally identifiable information like name, address or username. We don’t know who you are. And, to be honest, we don’t care. We analyze patterns of behavior, not people. We assign a unique ID to each web browser. This number represents you, but it isn’t you. Based on where you go online, we try to predict what brands, or candidates, you’d be most interested in. It is possible to provide actionable political and brand intelligence while maintaining a high standard of privacy and transparency. The result for any given user is more relevant advertising. And for all users a robust, free Internet