The Programming Thread

Swammerdami · Jul 30, 2024

Binary search is very useful and probably in much more common use than hash tables. BUT I cannot condone the ANSI standard bsearch() function.

When the key is not found in the searched table, it's not unusual to want the nearest element instead. Or even to create an entry for that new key. I don't insist that bsearch() be capable of creating a new entry. BUT it's already done the main task: locating where the new entry should go. However (to keep the interface very simple) bsearch() returns only a null pointer when no exact match is found. Insertion will require that you then do the equivalent of what bsearch() just did but you'll need to "roll your own." Rolling your own anyway, what good was bsearch()?

bsearch() itself is only a tiny bit of C source code, although it allows the same generality as qsort(). But you don't need that; start with a VERY trivial template and derive your own code for any particular table.

At least ANSI was smart enough to reject BSD's hsearch() which has a horrid interface.

Related to a sorted table for binary search (which uses time log_2(N) for look-up ) is a digital search tree (e.g. trie) where log_5(N) look-up time suffices if each node has fan-out of 5.

AND one of the most elegant (and amazing?) data structures I have ever seen (and which is NOT found in Knuth) is a memory-saving method to store a digital search tree in a hash table. (If anyone gets this far and is seized by excitement in anticipation of the possible amazement, I may share some details via PM.)

steve_bank · Aug 1, 2024

After some more reading I am not done with random numbers yet.

I posted one of the tests, the Chi Squared test. In a uniform random sequence the number of occurrences of each number are counted ad compared against theoretical counts.

That does not tell anything about how the numbers are distributed in the sequence. The occurrences of each number may be the same or close, but the sequence may not be random.

The sequential test looks at occurrences of sub sequences, looking for data clumping.

At 200 million samples a sequence of up to 7 numbers can be tested for a uniform distribution 0-10.

Above that run time is a problem on my PC.

Both the c++ mt19937 and the URAND function have max ints 0x7fffffff. To test over the full integer range at once would be difficult.

The full test would require calculating the theoretical probabilities of a sub sequence and do a Chi Squared test. I just compared URAND to C++ mt19937.

So far URAND algorithm seems to be a good replacement for rand() if you are using C. With C++ the random functions are preferred.

There is one more test, the spectral te

See Knuth Vol 2 Semi Numeral Algorithms chapter 3, empirical tests section.. PDFs online.

Code:

#include <chrono>
#include <random>

void sequential_test(void){
    cout<<"sequential testt"<<endl;
    unsigned seed = chrono::system_clock::now().time_since_epoch().count();
    mt19937 rand_gen(seed);
     uniform_int_distribution<int> dist (0,10);;
    unsigned long long rand_num = seed;
    unsigned long long urand_max = 0x7fffffff;
    unsigned long long m = 0x000fffffffffffff;
    unsigned long long mh = m/2;
    unsigned long long k1 = 2*ceil((mh)*(.5 - sqrt(3)/6.)) + 1;
    unsigned long long k2 = 8*ceil((mh)*atan(1.)/8.) + 5;

   int j,i, n = 200000000,urandcnt = 0,cppcnt = 0;
   int nseq = 3;
   int seq[8] = {0,1,2,3,4,5,6,7};
   int cppflag,urandflag;
   int *cpp = new int[n];
   int *urand = new int[n];

   for(i=0;i<n;i++){
        urand[i] = rand_num%11; //0-10
        rand_num = rand_num*k2 + k1;
        cpp[i] = dist(rand_gen);
    }
    for(i=0;i<n-nseq;i++){
        cppflag = 0;
        urandflag = 0;
        for(j=0;j<nseq;j++){
            if(seq[j] != urand[i+j])urandflag = 1;
            if(seq[j] != cpp[i+j])cppflag = 1;
        }
        if(cppflag == 0)cppcnt++;
        if(urandflag == 0)urandcnt++;
    }
    cout<<"cpp  "<<cppcnt<<endl;
    cout<<"urand  "<<urandcnt<<endl;
}

steve_bank · Aug 6, 2024

Knuth's Shuffle

Fisher–Yates shuffle - Wikipedia

en.wikipedia.org

The Fisher–Yates shuffle is named after Ronald Fisher and Frank Yates, who first described it. It is also known as the Knuth shuffle after Donald Knuth.[2] A variant of the Fisher–Yates shuffle, known as Sattolo's algorithm, may be used to generate random cyclic permutations of length n instead of random permutations.

The modern version of the Fisher–Yates shuffle, designed for computer use, was introduced by Richard Durstenfeld in 1964[4] and popularized by Donald E. Knuth in The Art of Computer Programming as "Algorithm P (Shuffling)".[5] Neither Durstenfeld's article nor Knuth's first edition of The Art of Computer Programming acknowledged the work of Fisher and Yates; they may not have been aware of it.[citation needed] Subsequent editions of Knuth's The Art of Computer Programming mention Fisher and Yates' contribution.[6]
The algorithm described by Durstenfeld is more efficient than that given by Fisher and Yates: whereas a naïve computer implementation of Fisher and Yates' method would spend needless time counting the remaining numbers in step 3 above, Durstenfeld's solution is to move the "struck" numbers to the end of the list by swapping them with the last unstruck number at each iteration. This reduces the algorithm's time complexity to O ( n ) compared to O ( n 2 ) for the naïve implementation.[7] This change gives the following algorithm (for a zero-based array).

Outside of new areas like AI it is hard to find a new ways of doing something that is better than what has been done.

Fisher was born in 1913.

Ronald Fisher - Wikipedia

en.wikipedia.org

Code:

void shuffle(int n,int *x){
    //Knuth's Shuffle in place
    int i,r,swap;
    for(i=n;i>=1;i--){
        r = rand()%i;  //rand() not the best choice
        swap = x[i-1];
        x[i-1] = x[r];
        x[r] = swap;
        }
}


int main(){

   int i,n=10;
    int x[n];
    for(i=0;i<n;i++)x[i] =i;
    shuffle(n,x);
    for(i=0;i<n;i++)cout<<i<<"  "<<x[i]<<endl;
    return 0;
}

steve_bank · 2024-08-16T14:43:19-0700

The finish of my foray into random numbers. A reference to URAND in Scilab documentation lled me to the URAND paper and then testing random number generators or more generally random sequences of any source.

If an electromagnet phenomena is detected from space seemingly a series of integers is it random?

Three of the test methods are frequency, sequential, and gap tests.

The frequency test counts the number occurrences of each number and a Chi Squared test compares the counts to what they should be for a perfect uniform distribution.

The frequency test provides no information on randomness of the sequences. For example a sequence 000111222...999 will pass the frequency test but is not random.

That is where sequential and gap tests come in. The sequential test as written counts the occurrences of each non overlapping pair. It looks for non random clumping of numbers. It can be extended to three or more groupings.

The gap test evaluates the distance between occurrences of each single number in the sequence, It looks for a non random bias.

Tests are applied to the CPP mt19937 generator. A benchmark for a quick check of other generators. More comprehensive testing requires a lot of insertions and time.

The C rand() function fails the fervency test as iterations increases, but looks ok with the gap and sequential tests, it takes all three tests.

URAND has results similar to mt19937.

Code:

#include <random>
unsigned Seed = time(NULL);
mt19937 gen(Seed);

void int_array_cpp(int n,int lo,int hi,int *y){
   int i;
   uniform_int_distribution<int> dist (lo,hi);
   for(i=0;i<n;i++)y[i] = dist(gen);
}
void frenquency_test(void){
    //ntrials power of 10  num power of 10
    cout<<"Frencuency Test "<<endl;
    int i,m,num =100,ntrials = pow(10,8);
    int *y = new int[ntrials];
    int_array_cpp(ntrials,0,num-1,y);
    int counts[num];
   for(i=0;i<num;i++)counts[i] = 0;
   double X=0;
   /// count occurnces
   for(i=0;i <ntrials;i++)counts[y[i]]++;
   m = ntrials/num;
   // chi squared test
   for(i=0;i<num;i++)X += double(pow(counts[i]-m,2))/m;
   FILE *p = fopen("freq.dat","w");
   for(i=0;i<num;i++)fprintf(p,"%10d\n",counts[i]);
   fclose(p);
   printf("Chi Squared  %.3f \n",X);
   cout<<"ntrials  "<<ntrials<<endl;
   cout<<"m  "<<m<<endl;
}

void sequential_test(void){
   cout<<"sequential test"<<endl;\
   int k,j,i, z,n = 100000000;
   int *yr = new int[n];
   int_array_cpp(n,0,9,yr);
   int counts[100];
   for(i=0;i<100;i++)counts[i] = 0;
   int x[10]  = {0,1,2,3,4,5,6,7,8,9};
    int y[10] = {0,1,2,3,4,5,6,7,8,9};

    for(i=0;i<n-2;i+=2){
        z = 0;
        for(j=0;j<10;j++){
                for(k=0;k<10;k++){
                    if(yr[i] == x[j] && yr[i+1] == y[k])counts[z]++;
                    z++;
                }
        }
    }
    double mx = n/200,X=0;
    for(i=0;i<100;i++)X += double(pow(counts[i]-mx,2))/mx;
    printf("chi squre .05 critical value 123.225  X  %.3f\n",X);
    FILE *p = fopen("seq.dat","w");
    for(i=0;i<100;i++)fprintf(p,"%10d\n",counts[i]);
    fclose(p);
}

void gap_test(void){
    // change num 0 to 9
   cout<<"gap test"<<endl;
   int i,n = 100000000;
   double aver;
   int *y = new int[n];
   int_array_cpp(n,0,9,y);
   int gap0,gap1,gap2,s = 0,k=0;
   int delta,max = -1,min = 100000;
    int num =0;
    for(i=0;i<n;i++)if(y[i] ==num){gap0 = i;break;}
    gap1 = gap0;
    for(i=gap0+1;i<n;i++){
        if(y[i] == num){
            delta = i-gap1;
            s += delta;
            if(delta > max)max = delta;
            if(delta<min)min = delta;
            gap1 = i;
            k++;
            }
    }
    aver = double(s)/double(k);
    printf("num %d  min %d  max  %d aver gap  %.5f\n",num,min,max,aver);
}

int main(){
    gap_test();
    sequential_test();
    frenquency_test();
    return 0;
}

The Programming Thread

Swammerdami

Squadron Leader

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

Fisher–Yates shuffle - Wikipedia

Ronald Fisher - Wikipedia

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...