2020 Vindicated Predictive Modeling

This article was co-written by Alan Zhang and Sophia Weng.

The day after the 2016 presidential election, the editors of Sabato’s Crystal Ball declared, “The Crystal Ball is shattered.” The nation watched in disbelief as Donald Trump was elected president in an upset that political commentators had assured their audiences was highly unlikely to occur.

Four years later, the Crystal Ball is on track to correctly predict the presidential winner in 49 out of 50 states, including the winners of both Maine’s and Nebraska’s 2nd Congressional Districts. They only missed North Carolina, which the Crystal Ball predicted would vote for Joe Biden. Other models by The Economist, the Cook Political Report, and FiveThirtyEight are doing almost as well.

Although polls are far from perfect, they still provide valuable insight into the state of a race. Moreover, election models can consider the context of each race, providing a more comprehensive evaluation of where a race stands.

The Polls

Even though polling overestimated Biden’s lead in crucial swing states, it is important to recognize that they still foretold the major narratives of the presidential race. For instance, polling indicated Democratic underperformance among Latino voters in the Sun Belt, a trend that ultimately appeared in the final results. A shift in the Rust Belt states of Wisconsin, Michigan, and Pennsylvania toward Biden, relative to Clinton’s 2016 performance, was also evident in pre-election data. Also in the Midwest, a final pre-election poll conducted by the polling firm Selzer & Co. found Trump with a 7% lead over Biden in Iowa. As of the time of writing, Trump currently leads Biden by 8.2% in Iowa according to unofficial election results.

On the other hand, polls also had their misses. In Maine’s U.S. Senate election, incumbent Susan Collins defeated her challenger, Democrat Sara Gideon, by roughly eight percentage points despite virtually all public polls indicating that Gideon was leading Collins.

This election highlights that it is essential to consider the inherent limitations of polls when interpreting polling data. Polls are a snapshot of where a race stands at a moment in time and cannot capture any shifts that happen afterward, such as late-deciding voters or major news events that could upend the state of the race. In addition, polls are traditionally weighted by demographic factors like gender, age, and education, to ensure that the polling sample reflects the expected demographic composition of the electorate. In a year with historically high voter turnout, however, low-propensity voters were not accounted for in traditional models and may have shifted the election in ways that are difficult to predict.

Indeed, Maine saw one of the highest voter turnout rates in the country, which could have factored into the massive polling error for its Senate race. Meanwhile, Ann Selzer of Selzer & Co. collects rigorous data on voter turnout, which could have contributed to the accuracy of her polls this year even though Iowa also experienced one of the highest turnout rates in the nation. Selzer & Co. excels where other polls fail, especially in Selzer’s home state of Iowa, possibly because she “wipes the slate clean every time” for each election’s turnout. In 2008, Selzer’s company predicted the record-breaking caucus turnout that launched President Obama to the Democratic nomination, a factor that other pollsters failed to anticipate.

A common paradigm of electoral analysis is that higher turnout benefits Democrats at the polls, but with turnout reaching levels not seen in more than a hundred years, it is no surprise that traditional assumptions faltered. Analysts have also suggested that Trump’s unique populist appeal may have encouraged higher turnout among his supporters than other candidates would have been able to generate. With that in mind, historically high turnout could have benefited Trump as well.

Polls are dependent upon a variety of assumptions based on prior data, and in such a unique and historic election, voter behavior can be challenging to predict. Inherent limitations to polls, including disparities in the willingness or availability of respondents, also mean that polls can sometimes undersample or oversample certain demographics of voters. Nevertheless, polls foresaw several key trends in the election and accurately indicated that Biden would win the presidency by carrying the key Rust Belt states.

The Models

In terms of predictive models, Sabato’s Crystal Ball correctly forecasted almost every electoral vote in the Electoral College while the Cook Political Report’s presidential ratings were correct for every electoral vote that was called in either direction (all votes not rated “tossup”). The model created by The Economist correctly predicted which candidate was favored in 48 out of 50 states, only missing Florida and North Carolina. FiveThirtyEight’s presidential model was also quite accurate — the actual electoral college scenario was even one of the 100 representative outcomes displayed on their model’s front page.

Election forecasts consider a variety of factors in addition to polling data, including fundraising, incumbency, scandals, demographic trends, and uncertainty from factors like turnout. They also account for the quality of polls when factoring them in. Forecasters like the Crystal Ball and the Cook Political Report aggregate this information in a qualitative sense and provide ratings such as “Lean Democratic” or “Solid Republican,” while FiveThirtyEight and The Economist use mathematical algorithms to generate scenarios and probabilities of victory for each candidate in every state.

At the congressional level, a handful of House races such as Iowa’s 1st Congressional District and Texas’ 23rd Congressional District saw upsets that the major forecasters deemed unlikely. These misses tended to be indicative of broader trends — for example, a collapse of Democratic support in the Rio Grande Valley, where TX-23 is located — and overall did not represent a major blunder on the part of the forecasts, since voter behavior regarding ticket-splitting (voting for candidates from different parties across multiple offices) and turnout can be difficult to precisely predict. With 435 House races and dozens of Senate seats up for election, at least a few upsets should be expected to occur in close races.

Using qualitative forecasts and quantitative models is a nuanced and effective way to predict the result of an election. This year, the four major forecasts (The Cook Political Report, The Economist, FiveThirtyEight, Sabato’s Crystal Ball) correctly predicted the outcomes of almost every state in the electoral college. Even considering the misses in state and local level races, forecasts and models succeeded in predicting the 2020 election.

Conclusion

Polls and election models are used by candidates and organizations alike to determine where to allocate resources and how to fine-tune messaging, and they inform the public’s expectations. Even though they are prone to error and bias like any statistical estimate based on incomplete data, they can still nevertheless be useful in electoral analysis. While crystal balls may not exist in real life, Larry Sabato’s editorial team might be the next best thing.

Image Credit: Photo by Clay Banks on Unsplash

Leave a Comment

Solve : *
5 + 23 =