Some comments on election polls

Whatsapp
Facebook
Twitter
Instagram
Telegram

By OTAVIANO HELENE*

Considering data from different research agencies has the advantage of increasing the amount of information, but increases fluctuation

Surveys of the population's electoral preferences give indications of what could happen if the election were to take place at that time. However, in order to evaluate the research results in a more general way, especially with regard to possible trends they may reveal, some precautions must be taken. Let's exemplify this based on the results of the preferences for the four main candidates in the presidential election polls carried out by various organizations in the months of February to March 2022,[1] before, therefore, the party changes, the definitions of the slates and the consolidation of candidacies, although not definitively, things that occurred in early April.

The data to be analyzed appear in figure 1. The ordinate (axis y) of each of the points in figure 1 shows the preferences expressed by the people consulted in a survey carried out on the date corresponding to the abscissa (axis x).[2] When, in the same survey, more than one set of candidates were considered (usually called a scenario by the media), the various results were represented on the same abscissa.

Figure 1 – Results of electoral polls; most preferred candidates.

Usually, poll results are presented along with a rough estimate of the “margin of error”, typically on the order of 2% or 3%. This “margin of error” means that, if the survey were carried out on the same day, using the same procedures and addressing people with the same profiles (age, gender, education, income, geographic region, religion, etc.), the results could be different within that range. These “error margins” are only of a statistical nature, that is, they do not include uncertainties due to the methodology adopted or hypotheses regarding the profile of the voting population. Furthermore, those margins are valid for candidates with good preferences; in the case of candidates with lower preference rates, the “margins of error” are smaller, as discussed later.

Therefore, variations below the margin of error between two successive surveys carried out by the same company do not allow us to assume that there has been, safely and really, a variation in the electorate's preference. This conclusion would only be possible if at least one of the following conditions were met: the variation was greater than the “margin of error”; a variation in the same direction occurs again in a new survey carried out by the same company; an equivalent variation occurred in surveys conducted by another company on the same dates.

That usual 2 or 3% is a rough estimate of the margin of error. This margin depends, in fact, on the degree of preference of the candidate. For example, as can be seen in Figure 1, while the maximum and minimum preferences for the two most highly rated possible candidates (Lula and Bolsonaro) varied, respectively, within a range of 13% and 12% in the period, the preferences for candidates Ciro and Moro varied within much smaller ranges, 5% and 4%, respectively. That margin of error, of the order of 2% to 3%, is valid for candidates with high preference.

As a rule, the closer to 50% the preference for a candidacy, the greater this “margin of error”. Thus, a 3% variation in preference for a candidacy, from 50% to 53%, for example, may mean nothing, while the same variation in a low-rated candidacy, say, from 5% to 8%, may be quite significant.

When comparing surveys by different companies, the ranges (“margins of error”) can be larger, as discussed below.

In addition to purely random variations, there may also be differences in results due to the adoption of different methodologies and hypotheses regarding the profile of the population that will vote (how it is distributed by income, schooling, age, region of the country, etc. .).

For example, among the data shown in Figure 1, there are two types of surveys: one in person and one by telephone. The differences between these two results are quite significant. In polls conducted by telephone, candidate Lula had, on average, around 2 to 3% fewer votes than in polls conducted in person. As for the candidate who appears in second place, the situation is reversed, with more votes in telephone consultations than in face-to-face consultations.

Facts like this must be considered when comparing results published by different research companies.

The data that appear in Figure 1, released over the months of February and March, do not show any trend in preferences for the four possible candidates over time.[3] If the election took place during that period and with those candidates included in the polls, the results would indicate a vote between 39 and 42% for Lula and between 26 and 29% for the other candidate. As for the valid votes, Lula would have between 45 and 49% of them.

It is necessary to say that the fact that the data do not indicate any evidence of significant variation over the two months considered is not evidence that there was no variation: absence of evidence of an effect is not evidence of its non-existence.

In addition to random variations typical of sampling processes and differences between results obtained by different polling companies due to different methodologies and hypotheses about voter profiles, there is a correlation effect between results that can lead to wrong conclusions.

To understand this, assume a situation that has only two candidates. A thousand people are interviewed and, say, 600 say they prefer A and 400 prefer B: 60% and 40% respectively. A few days later, on the same street, another 1000 people are interviewed (by the same company, using the same methodologies and assumptions, etc.). Even if there was no difference between the electorate's preference, the number of people who say they prefer A may be a little higher or a little lower than 600 just by chance. Let's say there are 630 (63%). Therefore, necessarily the number of people who say they prefer candidate B will be smaller, 370 (37%). This could give the impression that the electorate's preference varied: the preference for one of the candidacies increased and, “confirming this change in the electorate's position”, the preference for the other decreased; the difference between them increased by 6%, well beyond typical margins of error.

But the data do not support that conclusion, and the phrase in quotes above is wrong. The fact that B has decreased, instead of "confirming the trend", just reflects the fact that the sum of the percentage preferences for the two candidacies is fixed, 100%: if one grows, the other necessarily decreases.

When there are more than two candidates, this effect is less marked; however, when two of them have a large proportion of the total votes, as with the data shown in Figure 1, the effect is significant. This effect means that, although the preferences for the two main candidates varied within a range of 12% to 13%), the difference between the preferences of both varied, in the same period, from 22%.

A combination of all the effects described must have occurred in research in the period considered. For example, in the highlighted region on the left in figure 2, it seems that Lula's candidacy would have fallen, and a lot, with almost a reversal of position throughout the first half of February.

However, this fact may be just a combination of the considered effects. First, variations of the order of 3% in apparent preference for a candidate are not significant. Second, because of the effect discussed above, the increase, just by random fluctuation, in the score of one of these two most voted candidates implies, most likely, the decrease of the other, which causes variations in the difference of the order of twofold of that value, 6%, may not be statistically significant. A third effect is the fact that, at the end of the period highlighted in figure 2, telephone surveys are included; telephone polls gave, in the analyzed period, on average, fewer votes for Lula and more for Bolsonaro.

Therefore, the data do not allow us to conclude that there was a systematic variation during that period.

Figure 2 – Same as figure 1, but only for the two candidates with the highest preferences. The highlighted region could erroneously suggest a trend: a drop in preference for one candidacy and growth for the other.

 

Conclusion

We can analyze the results of electoral polls by looking only at the results presented by a single company over time, which prevents the observed variations from being affected by different hypotheses regarding the socioeconomic profile of the voting population and the methodologies adopted (telephone and face-to-face, for example). example). However, doing so limits the amount of information we can analyze.

Considering data from different research agencies has the advantage of increasing the amount of information, but it increases fluctuation because of the different assumptions and methodologies adopted.

Whatever the analysis option, it is necessary to avoid hasty conclusions. It is also important to remember that electoral preferences vary slowly over time, unless very striking facts or news, true or false, emerge.

*Otaviano Helene is a senior professor at the Institute of Physics at USP.

 

Notes


[1] Surveys carried out by the following agencies: Quaest, Ipespe, Datafolha, Paraná Pesquisas, MDA, Ideia, Futura, PoderData, Gerp.

[2] The dates correspond to those when the surveys were carried out, not those when the results were published.

[3] This does not mean that there is not some variation in voter preferences; there is just no evidence for it.

See all articles by

10 MOST READ IN THE LAST 7 DAYS

See all articles by

SEARCH

Search

TOPICS

NEW PUBLICATIONS