• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Zipf's Law and statistics

excreationist

Married mouth-breather
Joined
Aug 28, 2000
Messages
2,641
Location
Australia
Basic Beliefs
Probably in a simulation
....a bizarre pattern emerges. The second most used word will appear about half as often as the most used. The third one third as often. The fourth one fourth as often. The fifth one fifth as often. The sixth one sixth as often, and so on all the way down...


I found the video to be interesting... and mysterious...
 
The mathematics of  Zipf's law is interesting. It means that something with position n in order of size will have probability
\( p(n) = \frac{p_0}{n} \)

where p0 is a normalization constant. Adding up over all n up to some maximum value N gives us
\( 1 = \sum_{n=1}^N p(n) = p_0 \sum_{n=1}^N \frac{1}{n} = p_0 (\log N + \gamma + O(1/N)) \)

in the limit of large N, where the logarithm is the natural one, relative to e = 2.7182818... and (gamma) is the  Euler–Mascheroni constant = 0.57721...

The Wikipedia article has a graph of word counts for several sub-Wikipedias, and there is a slight bend downward at a rank of about 10,000, going from a negative power around 1 to a slightly higher absolute value.
 
Word and letter probability was part of early code breaking techniques.

In a coded document or communication match symbols for words and letter based on probability of words.
 
Back
Top Bottom