steve_bank
Diabetic retinopathy and poor eyesight. Typos ...
simple statistical simulation.
There are two general forms of random sampling. Sampling with and without replacement.
Put 80 red and 20 blue balls in a box. Shake and pick one, put it back in, shake, and repeat. Sampling with replacement, the probability of picking red or blue does not change on each trial.
If the ball is not returned to he box the probability of blue or red changes on the next trial. Sampling without replacement.
A shipment of 100,000 widgets are delivered to your factory. The colors are nominaly80% red and 20% blue. You need to estimate the distribution of colors before accepting ting the shipment.
If you run the code the cumulative distribution plots of the sample means shows a normal distribution. A consequence of the Central Limit This allows the use of normal statistics to estimate the true value parameters of a population from a sample, even if the underlying distribution i not normal. In this cae the population is not normal. There may be excretions to the CLT, but I never saw it in work.
The cumulative distribution is the plot of data in ascending order versus the parentage of points at at a point. Data can have a raggedy histogram but a clear CDF. The CDF is the integral of the PDF and acts as a smoothing function.
Intuitively we average repeated measurements to estimate the true value. If I remember right from maximum likelihood estimators a distribution has an expected value, the most probable. Take the derivative of the PDF, set to zero, ad solve for the function that maximizes.
For a normal distribution the best estimator is the arithmetic mean.
Run the code for increasing sample sizes and converges on 80% for red.
For random sampling the spread depends solely on the sample size. The confidence interval is a degree of confidence in an estimate of the true value of a parameter.
The sample size for a confidence interval can be calculated, there are online calculators.
Plenty of information and examples on the net.
en.wikipedia.org
Code takes a bit to run be patient
There are two general forms of random sampling. Sampling with and without replacement.
Put 80 red and 20 blue balls in a box. Shake and pick one, put it back in, shake, and repeat. Sampling with replacement, the probability of picking red or blue does not change on each trial.
If the ball is not returned to he box the probability of blue or red changes on the next trial. Sampling without replacement.
A shipment of 100,000 widgets are delivered to your factory. The colors are nominaly80% red and 20% blue. You need to estimate the distribution of colors before accepting ting the shipment.
If you run the code the cumulative distribution plots of the sample means shows a normal distribution. A consequence of the Central Limit This allows the use of normal statistics to estimate the true value parameters of a population from a sample, even if the underlying distribution i not normal. In this cae the population is not normal. There may be excretions to the CLT, but I never saw it in work.
The cumulative distribution is the plot of data in ascending order versus the parentage of points at at a point. Data can have a raggedy histogram but a clear CDF. The CDF is the integral of the PDF and acts as a smoothing function.
Intuitively we average repeated measurements to estimate the true value. If I remember right from maximum likelihood estimators a distribution has an expected value, the most probable. Take the derivative of the PDF, set to zero, ad solve for the function that maximizes.
For a normal distribution the best estimator is the arithmetic mean.
Run the code for increasing sample sizes and converges on 80% for red.
For random sampling the spread depends solely on the sample size. The confidence interval is a degree of confidence in an estimate of the true value of a parameter.
The sample size for a confidence interval can be calculated, there are online calculators.
Plenty of information and examples on the net.
![en.wikipedia.org](/proxy.php?image=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2F5%2F5c%2FNormal_distribution_50%2525_CI_illustration.svg%2F1200px-Normal_distribution_50%2525_CI_illustration.svg.png&hash=c7d16e7d85350d1a0eb45baaaeb76d7a&return_error=1)
Confidence interval - Wikipedia
Code takes a bit to run be patient
Code:
import math as ma
import array as ar
import statistics as st
import numpy as np
import scipy as sp
import random as rn
import matplotlib.pyplot as plt
#population integers 1-100
# 1-80 red balls 81-100 blue balls
npop = 100000 #population size
nsamp = 200 #sample size
niter = ma.floor(npop/nsamp)
population = ar.array("i",npop*[0]) #population
red = ar.array("i",niter*[0])
blue = ar.array("i",niter*[0])
pcred = ar.array("d",niter*[0])
cd = ar.array("d",niter*[0])
for i in range(npop):population[i] = rn.randint(1,100)
print("Iterations ",niter)
print("nsamp ",nsamp)
for i in range(niter):
bluecnt = 0
redcnt = 0
rn.shuffle(population) #shake the box
for j in range(nsamp):
s = rn.randint(0,npop-1) # pick a ball and put it back
if population[s] >80:
bluecnt = bluecnt + 1
else:
redcnt = redcnt + 1
red[i] = redcnt
blue[i] = bluecnt
pcred[i] = redcnt/nsamp
minblue = min(blue)
maxblue = max(blue)
minred = min(red)
maxred = max(red)
meanred = st.mean(pcred)
stdred = st.stdev(pcred)
print("counts min red %d max red %d" %(minred,maxred))
print("counts min blue %d max blue %d" %(minblue,maxblue))
print(" Mean Percent %.5f Standard Deviation %.5f" %(meanred,stdred))
red = sorted(red)
blue = sorted(blue)
pcred = sorted(pcred)
for i in range(niter):cd[i] = 100*(i+1)/niter #cumulative distribution
##fname = "c:\\python\\data\\samp.txt"
##f = open(fname,"w");
##for i in range(niter):
## s = ""
## s = s + repr(cd[i])+"\t"+repr(red[i])+"\t"+repr(blue[i])+"\n"
## f.write(s)
##f.close()
#for i in range(niter):print("%f\t %d\t %d\t %d" %(cd[i],red[i],blue[i],red[i]+blue[i]))
plt.grid(which='major', color='k',linestyle='-', linewidth=0.8)
plt.plot(red,cd,linewidth=2.0,color="k")
plt.show()
plt.grid(which='major', color='k',linestyle='-', linewidth=0.8)
plt.plot(blue,cd,linewidth=2.0,color="k")
plt.show()
plt.grid(which='major', color='k',linestyle='-', linewidth=0.8)
plt.plot(pcred,cd,linewidth=2.0,color="k")
plt.show()