Polling

From Forecasting Wiki
This article is work in progess. The author is working on it and it is not yet ready for review.

[TODO: everybody knows what a poll is, but it would be good to have a wiki-like sentence introducing it more or less formally; maybe a less convoluted version of "a poll is a random approximation of some statistical property that is infeasible to obtain exactly"]

Ideal assumptions[edit]

In order for polls to be meaningful, we usually implicitly assume some hypotheses on the methodology. While hardly any poll will ever satisfy any of them, a good forecaster will weigh polls according to how closely they satisfy them (amongst other criteria); i.e. the more the following assumptions are expected to be violated, the less they should cause us to update our forecasts.

  • Representativeness. Online polls will overrepresent younger demographics, polls on a political party's website will overrepresent their voters, etc. In particular there is no selection bias and voters are chosen independently of all the others apart from the requirement that they are not selected twice.
    • Note that there are exceptions to this where cleverly introducing dependency between voters makes results more accurate.
    • Moreover, if the introduced sampling bias is uncorrelated with the polling question at hand, it has little impact: E.g. phone polls are biased towards the subset of the population that owns a phone, but owning a phone is probably not strongly correlated with whether you eat meat or not.
  • Fixed poll size. We do not keep polling until we get a desired result and then stop.
  • Honest replies. Social desirability bias, e.g. don't trust polls that ask voters embarrassing questions while filming them.
  • Honest reporting. Results are reported accurately and independently of the outcome. (No polls where undesired outcome are hidden or lied about.)

Accuracy[edit]

Under ideal assumptions a (perhaps surprisingly) small number of votes is needed to get very representative results with high probability. For example, asking 1000 voters for some binary question, the 95% confidence interval is around ±3% of the true population mean, regardless of population size.[1] (This statement ignores Bayesian priors or assumes that you have none.) Note that this is uniform of the population size, so we don't need to poll more voters in big countries than in small countries.