Last week we had the presidential election between Hillary Clinton and Donald Trump. The most surprising thing about the outcome is that the major poll-based forecast models were wrong. They all gave Clinton a greater than 71% chance of winning. To be fair, none of these models gave Trump a 0% chance of becoming president. And this is key, because it meant that Trump ALWAYS had a chance of succeeding; the same way anyone can beat a casino game even when the odds are unfavorable. Similar election surprises have been noticed earlier this year with the vote for Brexit, and the vote for peace between Colombians and the FARC.

Outcomes that defy polls are not new. In the 1948 election between Thomas Dewey and Harry Truman, Gallup incorrectly predicted Dewey as the next president. The Chicago Tribune was so confident in the poll that it published the wrong outcome in its front page. Their mistake was quota sampling instead of random sampling!

So, why were the predictions wrong? And what can be done to improve this?

To comprehend this, we need to understand how these predictions work. To make a prediction we need two things: data and a model. Data is like the ingredients used to bake a cake, and the model is the recipe. The cake can turn up great or poor based on the recipe used. Nevertheless, if the ingredients are spoiled, no recipe will be able to produce a great desert.

In predicting elections, the data used came mainly from polls, or data collected by asking individuals who they will vote for. The idea is that if we ask a small sample of the population what they think, we can extrapolate their answers to the entire population. There are many companies that are in the polling business and they each have their own unique methodology which provides different outcomes.

In the ideal case, when conducting polls, individuals are sampled at random, and they provide information about who they intend to vote for. A few issues that limit the polls accuracy include:

Flip-Flopping: the polls can capture what the individual would do today, but not what he/she will do in the future.
Undecided: we can estimate how many voters are in this category, but not what their final decision will be.
Low Turnout: it is possible to capture the opinion of likely voters, but not if they will show up to vote.
Nonresponse Bias: A very small percentage of individuals respond to polls. There can be bias if the people responding do not represent all demographics.
Technology Bias: Not all voters have access to technology (especially in rural areas). Therefore, phone or internet based polls may not reach a sector of the population. Consequently, their opinions are not properly recorded.
Voice Recorder versus Live Person: Responses to voice recorders do not match live persons. According to an interview of Susquehanna Polling & Research Inc. “Trump did better when voters were sharing their voting intention with a recorded voice rather than a live one”.
Hidden Votes: Supporting a candidate can come with a stigma, thus a lot of people may not share their true intentions. A way to avoid this is to indirectly ask who they support, such as asking them to speculate on who they think their neighbors would support. This line of questioning assumes that the intent of an individual agrees with their neighbors. Per the Trafalgar Group there were big differences between asking individuals who they would support compared to their neighbor, with the latter being more consistent with the outcome.
Small Sample Size: the errors in the polls is inversely proportional to the sample size. The more people sampled the more accurate the poll will be.

A solution that has been popularized recently is to create a model that aggregates all polls to create a prediction. This is sort of like averaging all the polls together. The challenge is that some polls have historically been more accurate than others, so they should be used with different levels of importance. The difference between the various models include determining:

Which polls to include.
The weight or importance of each poll.
How to mix information from national versus state polls.
How to mix polls that appear with different frequencies (some polls appear weekly while others monthly).

Specifically, the US election is a complicated system in which the winner is not who gets the most votes (known as the popular vote), but who gets the most Electoral Votes (EV). There are 538 EV that are distributed unevenly across each state. Whoever gets at least 270 EV wins.

The way the models are created is more of an art than a science. But these models get to be validated by testing them against previous elections. This way we can understand their accuracies. A main issue that arises when doing this type of validation is that elections happen at different points in time. The conditions of an election carried this year are very different from previous years. So, it is a bit of an apples and oranges comparison.

As George Box said “all models are wrong but some are useful”. The idea that all models had errors in the same direction (e.g. predicting a Clinton win), makes us wonder if the models were poorly designed. There is always room for improvement. For example, the Princeton Election Consortium acknowledged that they “did not correctly estimate the size of the correlated error – by a factor of five”. This is a clear modification that they can account for next time. Although this would have given Clinton a lower chance of winning, she would have still been the more likely candidate to become president. Yet, there is not much that can be done if the underlying poll data is flawed. Garbage in, garbage out!

In conclusion, creating procedures that predict elections is not an easy feat. It is important that the models and poll methodologies improve thru time. These forecasts will be useful for identifying which policies are being supported or understood by the same individuals that will be affected by it. The hope is that these systems will allow politicians and their constituents to better understand each other.