Statistics and polling error

Jayjay · Dec 14, 2020

So I was reading wikipedia, and stumbled upon this:

https://en.wikipedia.org/wiki/Margin_of_error#Specific_margins_of_error

(Sorry, easier to screen capture than convert the equation symbols.)

I don't get the last sentence and the formula. If the reported probability with some N is 0% or 100%, to me it seems that it is still possible that there is an error and the probability of answers is just small compared to N. Let's say a probability of an event is one out of a million. If I take hundred samples, the probability that none of them match is still about 1 out of 10,000. So the result for the poll would likely be 0% for that event. Shouldn't the margin of error reflect that somehow?

In an extreme case, I take one sample. Margin of error is always zero?

I'm trying to think of this in terms of margins of errors that are reported in opinion polls. Obviously, there are many other factors that impact the accuracy of polls, but here I'm just focusing on the statistical error. So assuming that the sampling is perfectly random, the population size is infinite, and the polling company logo is a perfect circle.

barbos · Dec 14, 2020

These formulas assume that total number of people voted for any particular choice is a large number.
And "large" is more than 10 in practice.

Jayjay · Dec 15, 2020

barbos said:
These formulas assume that total number of people voted for any particular choice is a large number.
And "large" is more than 10 in practice.

I'm trying to understand what the intuition is behind the formula. Why is the error margin highest for choices that are close to 50%, but lowest for choices that are close to 0% or 100%?

barbos · Dec 15, 2020

Jayjay said:
barbos said:

These formulas assume that total number of people voted for any particular choice is a large number.
And "large" is more than 10 in practice.

Click to expand...

I'm trying to understand what the intuition is behind the formula. Why is the error margin highest for choices that are close to 50%, but lowest for choices that are close to 0% or 100%?

Well, It's a binomial distribution. it has width which is what it is - sqrt(n*(N-n)/N). (All numbers must be large)

But if you want intuition You start with normal distribution (Gaussian).
Then Poisson distribution which with large numbers becomes Gaussian with dispersion=D=sqrt(Mu) (Mu>>1)
Then binomial which becomes Poisson for n/N<<1.

Binomial has to by symmetrical with respect to p->(1-p) or

-> (N-n) and have the same dispersion (width) as Poisson for n/N<<1 that means D=sqrt(mu*(N-mu/N))

With smaller statistics, distributions stops being normal. But more importantly dispersion becomes so large that you can't really use observed number as expectation. that's why these formulas don't work.
I mean, they still kinda work up to Mu>1 but you have to use actual expectations, not observations.
for Mu<=1 you really have to treat your distribution as Poisson, not gauss.

Swammerdami · Dec 15, 2020

Jayjay said:
I'm trying to understand what the intuition is behind the formula. Why is the error margin highest for choices that are close to 50%, but lowest for choices that are close to 0% or 100%?

It's easier to think of the population as fixed, and the sample variable. That's the opposite problem from the question you ask, but the required intuition is similar.

Suppose you draw 100 times from an urn with a huge number of balls, some red, some blue. If 50% of the balls in the urn are red, you figure to pull out 50 reds and 50 blues, but the standard deviation will be about 5, i.e. √(50*50/100). About 13.6% of the time you'll get "lucky" and draw more than 55 blues; you have the same chance of drawing more than 55 reds.

But now suppose that only 4% of the balls in the urn are blue. You'll draw 4 blues on average with a standard deviation of 1.96, i.e. √(4*96/100). 1.96% is less than 5%, answering the question, but what about the intuition?

Doesn't it seem that with 50% blue balls it's more likely to draw an extra 5 blues, than when only 4% of the balls are blue?

Better yet, reverse it! With 50% red balls you're 13.6% to draw more than 55 reds. But if 96% of the balls are red, you are VERY unlikely to get more than 101 reds with only 100 draws, yet that's what you'd need to get the "same" dispersion as at 50%.

Statistics and polling error

Jayjay

Contributor

barbos

Contributor

Jayjay

Contributor

barbos

Contributor

Swammerdami

Squadron Leader