• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Statistics and polling error

Jayjay

Contributor
Joined
Apr 7, 2002
Messages
7,173
Location
Finland
Basic Beliefs
An accurate worldview or philosophy
So I was reading wikipedia, and stumbled upon this:

https://en.wikipedia.org/wiki/Margin_of_error#Specific_margins_of_error

margin_of_error.png

(Sorry, easier to screen capture than convert the equation symbols.)

I don't get the last sentence and the formula. If the reported probability with some N is 0% or 100%, to me it seems that it is still possible that there is an error and the probability of answers is just small compared to N. Let's say a probability of an event is one out of a million. If I take hundred samples, the probability that none of them match is still about 1 out of 10,000. So the result for the poll would likely be 0% for that event. Shouldn't the margin of error reflect that somehow?

In an extreme case, I take one sample. Margin of error is always zero?

I'm trying to think of this in terms of margins of errors that are reported in opinion polls. Obviously, there are many other factors that impact the accuracy of polls, but here I'm just focusing on the statistical error. So assuming that the sampling is perfectly random, the population size is infinite, and the polling company logo is a perfect circle.
 
These formulas assume that total number of people voted for any particular choice is a large number.
And "large" is more than 10 in practice.
 
Last edited:
These formulas assume that total number of people voted for any particular choice is a large number.
And "large" is more than 10 in practice.
I'm trying to understand what the intuition is behind the formula. Why is the error margin highest for choices that are close to 50%, but lowest for choices that are close to 0% or 100%?
 
These formulas assume that total number of people voted for any particular choice is a large number.
And "large" is more than 10 in practice.
I'm trying to understand what the intuition is behind the formula. Why is the error margin highest for choices that are close to 50%, but lowest for choices that are close to 0% or 100%?
Well, It's a binomial distribution. it has width which is what it is - sqrt(n*(N-n)/N). (All numbers must be large)


But if you want intuition You start with normal distribution (Gaussian).
Then Poisson distribution which with large numbers becomes Gaussian with dispersion=D=sqrt(Mu) (Mu>>1)
Then binomial which becomes Poisson for n/N<<1.

Binomial has to by symmetrical with respect to p->(1-p) or (n) -> (N-n) and have the same dispersion (width) as Poisson for n/N<<1 that means D=sqrt(mu*(N-mu/N))

With smaller statistics, distributions stops being normal. But more importantly dispersion becomes so large that you can't really use observed number as expectation. that's why these formulas don't work.
I mean, they still kinda work up to Mu>1 but you have to use actual expectations, not observations.
for Mu<=1 you really have to treat your distribution as Poisson, not gauss.
 
I'm trying to understand what the intuition is behind the formula. Why is the error margin highest for choices that are close to 50%, but lowest for choices that are close to 0% or 100%?

It's easier to think of the population as fixed, and the sample variable. That's the opposite problem from the question you ask, but the required intuition is similar.

Suppose you draw 100 times from an urn with a huge number of balls, some red, some blue. If 50% of the balls in the urn are red, you figure to pull out 50 reds and 50 blues, but the standard deviation will be about 5, i.e. √(50*50/100). About 13.6% of the time you'll get "lucky" and draw more than 55 blues; you have the same chance of drawing more than 55 reds.

But now suppose that only 4% of the balls in the urn are blue. You'll draw 4 blues on average with a standard deviation of 1.96, i.e. √(4*96/100). 1.96% is less than 5%, answering the question, but what about the intuition?

Doesn't it seem that with 50% blue balls it's more likely to draw an extra 5 blues, than when only 4% of the balls are blue?

Better yet, reverse it! With 50% red balls you're 13.6% to draw more than 55 reds. But if 96% of the balls are red, you are VERY unlikely to get more than 101 reds with only 100 draws, yet that's what you'd need to get the "same" dispersion as at 50%.
 
Back
Top Bottom