• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

Probability Distributions

There's only one thing anyone wants to hear from you, steve, And that is an explanation for the result you claimed:


Execution times C++ Mersenne Twister vs LG.

for 10^10 iterations C++ took 68 seconds, URAND LG took .016 seconds.

Nothing did 10^10 iterations in 16 milliseconds. Steve is apparently unable to find the bug. Let's try to help him.

One possibility is that his loop looked like
for (iter = 0; iter < (10^10); iter++)
or
for (iter = 0; iter < 10,000,000,000; iter++)
Although to a casual non-programmer these might appear to do 10 billion iterations, in fact they do zero. This despite that (10^10) and 10,000,000,000 appear to be synonyms of ten billion.

But I don't think that's where steve went wrong. Presumably he had successfully coded a ten-billion iteration loop for his Mersenne Twister result. Having "ironed its bugs out", would he have "struck out for new frontier" for a similar loop?

There's another reason the second bug seems unlikely. The compiler will produce a warning when an option like "-Wall" is present. It seems safe to assume steve uses that option a lot.

Another possibility is that steve uses a smart optimizing compiler, and that that compiler could tell that the ten billion iterations would have no side effect ... so elided the whole loop altogether! I suppose it's possible that steve had never heard of such a thing, and is still wondering what happened.

But what is especially remarkable -- regardless of what steve's underlying bug was -- is that he has failed to acknowledge that there was a bug. What could he be thinking of? Is it possible that steve couldn't even grasp the absurdity of the timing he claimed? The result of the bug is plainly visible since Nothing did 10^10 iterations of anything in 16 milliseconds. Is he just hoping the bug will be forgotten? I've written over a million lines of code and am not ashamed to admit I couldn't count all the bugs on my digits even if I took my shoes off!

- - - - -- - - - - - -

I've thought of synopsizing some of my credentials in a Lounge thread. They're more a matter of shame for me than pride: With the genes I was dealt I should have been a math professor or start-up multimillionaire. But the genes were not enough, and the idiot part of my idiot-savant pathology took over. Still, there were two different V.P.s of Engineering who called me "the best microprogrammer in Silicon Valley." And two people called me the best circuit designer they'd ever met. (One of these had headed an IBM Research Lab, founded two companies, and ... was the best circuit designer I had ever met. I have over 30 U.S. patents, have written books, a newspaper column, and have published several peer-reviewed papers.

So I should just laugh at steve's pathetic efforts to insult me. But -- call me childish -- I do get irritated, so I set steve to ignore. If he ever explains the peculiar result quoted above, or manages to say something interesting or useful I hope someone will alert me.
 
Testing humungus.

As the number of digits is increased the number of iterations has to increase to get statistical significance. I tested at 10^8 iterations which results in 10,000 counts per number,0-9999. For these conditions the averages vary by a small amount.

Above 10^8 execution time starts to get long.

For the C++ mt19937 generator the observed probabilities were within an average of about 0.08% of a perfect uniform distribution.

The probability for each digit is 1/10. For n digits the probabilities multiply for a given number of digits, or just 1/(total number of numbers),

For 4 digits the probability of a number from 0-9999 is 1/10000.

The rand() generator varies from about 0.2%to 0.8%, not as good as the C++ built in but can be acceptable depending on what you need, 0.x% vs 0.0x%.

Code:
#include <chrono>
#include <random>
#include <bits/stdc++.h>


long long unif_long_long(int ndigits){
    //  0 - 999999999999999999
    //max long long digits 18
    //max int digits 9
    if(ndigits > 18 || ndigits < 1)return -1;
    typedef mt19937 random_engine;
    unsigned seed = chrono::system_clock::now().time_since_epoch().count();
    random_engine generator (seed);
    uniform_int_distribution<int> dist (0,9);
    unsigned long long digit,x = 0;
    int i;
    for(i=0;i<ndigits;i++){
        digit = dist(generator);
        x +=  digit * pow(10,i);
    }
    return  x;
}

void humonous_test1(void){
    typedef mt19937 random_engine;
    unsigned seed = chrono::system_clock::now().time_since_epoch().count();
    random_engine generator (seed);
    uniform_int_distribution<int> dist (0,9);
    srand(time(NULL));
    //---------------------------------------------------
    long long x,k,digit,n = pow(10,8);
    long long cnt[10000];
    double pc[10000];
    long long i,ndigits = 4;
    for(k=0;k<10000;k++)cnt[k] = 0;
    for(k=0;k<n;k++){
        x = 0;
        for(i=0;i<ndigits;i++){
            //digit = rand()%10;
            digit = dist(generator);
            x += digit*pow(10,i);
        }//i
        cnt[x]++;
    }
     long long s=0;
     for(i=0;i<10000;i++)s += cnt[i];
     printf("n %lld total counts   %lld\n",n,s);
     long long  m = (n)/10000;
     double pcavg = 0;
     for(i=0;i<10000;i++){
        pc[i] = 100*double((m-cnt[i])/100000.);
        pcavg += abs(pc[i]);
        }
    pcavg = pcavg/10000;
    printf("average deviation from theoretical %.5f\n",pcavg);
    for(k=0;k<100;k++)printf("%lld\t %lld\t  %+.5f \n",k,cnt[k],pc[k]);

}
 
Applying the Chi Squared Distribution.

From reading Knuth's chapter on random numbers and on the net one of the general tests for a random number generator is the frequency-chi squared test.

X^2 compares expected to observed. In this case the expected is a set of bins with an equal number of counts from a flat uniform distribution. and observed is the output of a random number generator to be evaluated.

X^2 yields a number. If the number is less than the critical value the hypothesis is accepted. The critical value is a percentage point on the cumulative distribution, 0.05 being a common choice.

If Chi^2 < critical value the hypothesis is accepted. In this case 0.05 means being within 5% of a perfect uniform distribution.

It can be applied to a normal distribution, the expected for each bin would be l a normal histogram.

Search on 'chi squared tests' and 'random rummer generation tests'.

I could not get the CPP X^2 function to work so I used Python to get the critical value. There are tables and calculators online.

import scipy as sp
critical value = sp.stats.chi2.ppf(1-.05,df = 99) = 123.225.
The critical value is not hard and fast. It provides a risk assessment.

A more formal comparison of rand(),RAND and C++ mt19937 .

At 10^4 iterations X^2 for rand(),C++, and URAND at around 100. As iterations increase URAND and C++ stay around 100 and rand() gets worse. I expect it has to do with the period of rand().

X^2 at 10^9 iterations:
rand() 2154.2
C++ mt19937 103.7
URAND 85.1


Code:
void chi_sq_test1(void){
    int num = 100;
    //--------------------------------------------------
    unsigned long long rand_num = time(NULL);
    unsigned long long rand_max = 0x7fffffff;
    unsigned long long k1 = 951722585092923;
    unsigned long long k2 = 1768559438007117;
    //------------------------------------------------------
    typedef mt19937 random_engine;
    unsigned seed = chrono::system_clock::now().time_since_epoch().count();
    random_engine generator (seed);
    uniform_int_distribution<int> dist (0,num-1);
    srand(time(NULL));
    //---------------------------------------------------
    long long ntrials = pow(10,9);
    long long i,cntc[num],cntcpp[num],cntu[num];
    long long m,x1,x2,x3;
    for(i=0;i<num;i++){cntc[i] = 0;cntcpp[i]=0;cntu[i]=0;}
    double Xc,Xcpp,Xu;
   //count occurrences of 0-99
    for(i=0;i<ntrials;i++){
        x1 = rand()%num;
        x2 = dist(generator);
        rand_num = rand_num*k2 + k1;
        x3 = (rand_num%rand_max)%num;
        cntc[x1]++;
        cntcpp[x2]++;
        cntu[x3]++;
    }//i

     m = ntrials/num;
     printf("m %lld\n",m);
    // the X^2 test
     for(i=0;i<num;i++){
        Xc += double(pow(cntc[i]-m,2))/m;
        Xcpp += double(pow(cntcpp[i]-m,2))/m;
        Xu += double(pow(cntu[i]-m,2))/m;
    }
    printf("rand()  %.5f  cpp %.5f urand  %.5f\n",Xc,Xcpp,Xu);

}
 
Not much interest on the forum for applied math, moving on to the Chi Squared distribution. One of the most widely used in everyday statistics.

Using te C++ function to treated the distribution the cumulative distribution is calculated.

The parameter is degrees of freedom. For a table df = (rows-1)*(cols-1). For a 2 column 100 row data set df = 99.

Results checked against tables.



Gnuplot script
set yrange[*:1]
set xrange[0:*]
plot 'chi.dat' using 1:2 with lines ls 4 lt -1 lw 3
show grid


Code:
#include <chrono>
#include <random>
#include <algorithm>
#include <bits/stdc++.h>

typedef mt19937 random_engine;
unsigned seed = chrono::system_clock::now().time_since_epoch().count();
random_engine generator (seed);

void chi_square_dist(int n,int df,double *y){
        //df degrees of freedom (rows-1)*(cols-1)
        int i;
        chi_squared_distribution<double> dist (df);
        for(i=0;i<n;i++)y[i] =dist(generator);
}


void chi_test(void){
    int i,n = 1000,flag = 0;
    double y[n],cd[n],p1,p2,p3;
    int df = 25;
     chi_square_dist(n,df,y);
     for(i=0;i<n;i++)cd[i] = 1-double(i+1)/double(n);
     sort(&y[0],&y[n]);

     for(i=0;i<n;i++){
        if(flag == 0 && cd[i] <= 0.9){p1 = y[i];flag = 1;}
        if(flag == 1 && cd[i] <= 0.1){p2 = y[i];flag = 2;}
        if(flag == 2 && cd[i] <= 0.05){p3 = y[i];flag = 4;}
        }
    printf("0.9  %f  0.1  %f  0.05  %f\n",p1,p2,p3);
    FILE *p = fopen("chi.dat","w");
    for(i=0;i<n;i++)fprintf(p,"%f %f\n",y[i],cd[i]);
    fclose(p);
    system("cd.plt");

}
 
Back
Top Bottom