Data-Jockeying the Polls

Now, I realize that many of you think public opinion polling is a silly thing to do in an increasingly authoritarian country like ours. And I grant that...

A range, not a number

Now, I realize that many of you think public opinion polling is a silly thing to do in an increasingly authoritarian country like ours. And I grant that the massive disparities between polls we’re seeing – Capriles is leading by a couple of points and Chávez is ahead by thirty! – can only inspire a healthy dose of skepticism.

My feeling is that if I’m sailing in thick fog, of course I’d prefer a state of the art GPS…but if I can’t get it, I’ll gladly take a rusty old compass. It may not tell me precisely where I am or exactly where I’m going, but in difficult circumstances, it’s better than nothing.

There’s a lot of magical thinking about polls. No matter how often pollsters say so, people find it hard to accept that a poll is necessarily backward-looking – it can estimate where public opinion was when the question was asked, which, by definition, is in the past. Pollsters are not clairvoyant: basing a forward-looking prediction on a backward-looking study is an irreducibly fraught exercise.

Pollsters are keenly aware of this – poll readers, much less so.

Another key word that tends to get lost is “estimate”. A poll is an estimate. You take a random sample from a population and you apply certain statistical techniques to infer something about the behaviour of the population as a whole from the behaviour of the sample. Modern statistical techniques allow you to precisely calculate the odds that your estimate does or does not match the population as a whole, but we’re still talking about odds. The headline number pollsters report is just the center of the distribution of likely real results – the best guess about the characteristics of a population that can be made when you’ve talked to only a part of that population.

In other words, when a pollster tells you Chávez is at 44% with a margin of error of + or – 3%, there isn’t anything “magical” about the number “44”. What it really means is that there’s a 95% chance that Chávez’s support in the population as a whole at the moment the poll was made was higher than 41% and lower than 47%. It also means, of course, that there’s a 5% chance it was outside that range, which is something else that’s too seldom appreciated: it is a matter of mathematical certainty that one out of 20 well conducted polls will be off by more than the margin of error.

And those are just the inherent limitations of polling as a research mechanism, before we even get to the specific difficulties of trying it in Venezuela. In an environment as challenging as ours, you need some quantitative sophistication to beat the polling we have into presentable form. And nobody’s being administering those beatings with more gusto than Iñaki Sagarzazu, currently at the University of Glasgow and – more relevantly, for my purposes – of YVPolis.

Iñaki’s gone through more trouble than most to identify the Bias Profile of each Venezuelan pollster, and uses the results to “correct” the results of their latest polls, giving a kind of synthetic-poll of polls – a range of estimates corrected for each pollster’s past bias.

His result, at this point?

Chávez esta en algún punto del rango entre 39 y 49, con un promedio de 46%. Capriles esta en el rango entre 27 y 43, con un promedio de 34. Estos rangos tienen 5 puntos de coincidencia, que significa es esta elección todavía no se ha decidido, especialmente si consideramos que la mayoría de estas encuestas se realizo antes de que la campaña comenzara oficialmente y que la gente empezara a prestarle atención a la elección.

One last note. One pattern that’s been clear in the last several election cycles is that while the early polling is all over the place, poll results do tend to converge around the real outcome as the election draws near. You can see that in Iñaki’s slides, which is based on the final public poll before each election, and where you can see many pollsters tend to do ok in most elections.

We’re still more than two months out from October 7th, and so we still have to consider the polls we have now “early polls”. We’re just now getting to the period when polling becomes really useful as a guide to what’s about to happen. So watch this space.