Dear Jokodo,
There is no doubt in my mind you're a very intelligent person, and thank you for all that work. But I am only a monkey with a very tiny brain.Could you just dumb it down, dumb it down, dumb it down....
Is the probability of drawing a red ball from the 20mill bucket 384,615:1
Which probability are we now talking about? The probability of drawing
a red ball, the probability of drawing
two red balls, or the probability of drawing two red balls of the same shade/two individuals with the same first initial, or even the probability that they'd both have the same given first initial? And if the latter - which one is it? Not all initials are equally frequent.
And is your frequency of 53 out of 20 million an actual empirical distribution in an actual population of 20 million, and you're looking for the probability to draw two of those 53, without replacement? Or is that just a way to represent the frequency in a potentially much larger population, or the priors that any individual would have that surname/any ball would be red, without knowing the actual empirical distribution? That also makes a difference. And finally, is your claim that the initials are all equally probable based on a count of those 53 individuals with surname X and their initials, or are those the priors you assume without knowing the actual distribution?
Just a yes or a no. And if no, could you just give the correct probability in the form xxxxxxxx:1
We can't be expected to answer your question when
you don't seem to know what the question actually is...
The correct probability to draw
at least one red ball when 53/20,000,000 is just a prior while the actual population is (for all practical purposes) infinite and/or the actual empirical distribution unknown is 1 - ((19,999,947/20,000,000)^20,000), which comes out as 0.05162. To explain: 19,999,947/20,000,000 is the probability to draw no red ball per draw, (19,999,947/20,000,000)^20,000 the probability to draw no single red ball in 20,000 consecutive draws, and one minus that number thus the probability to draw at least one red ball.
The correct probability to draw
at least one red ball when 53/20,000,000 is an
actual empirical distribution in a population that has exactly 20,000,000 members is 1 - 19,999,947/20,000,000 * 19,999,946/19,999,999 * 19,999,945/19,999,998 ... * 19,979,947/19,980,000. The factors 19,999,946/19,999,999 etc. are the probability to draw a red ball in a population where some (blue) individuals are already missing. This probability is marginally larger at 0.0516452 - as the number of blues goes down, your probability to fetch a red on each individual draw grows.
For the probability to draw two, it matters even more whether those 53/20,000,000 are just your priors or an actual distribution. All of this has been explained to you, with numbers, on page one of this discussion.
If you want an exact number, you first need to clarify what it is you actually mean to ask. Then you multiply the resulting probability for getting exactly two surnames X with the probability that two random individuals have the same initial, you multiply the probability for getting exactly three surnames X with the probability that at least one pairing in a group of three has matching initials, the probability of exactly 4 same surnames with the probability of at least one pair of matching initials in a set of four, etc., and when the numbers have become small enough to ignore, you add up what you've got so far.
Swammerdami has given you the tools to calculate the first factor, I have given you the tools to calculate the second factor in my previous post. There's nothing more we can do at this point. The only thing left for you to do at this point is a simple addition.