The Programming Thread

Swammerdami · Jul 30, 2024

Binary search is very useful and probably in much more common use than hash tables. BUT I cannot condone the ANSI standard bsearch() function.

When the key is not found in the searched table, it's not unusual to want the nearest element instead. Or even to create an entry for that new key. I don't insist that bsearch() be capable of creating a new entry. BUT it's already done the main task: locating where the new entry should go. However (to keep the interface very simple) bsearch() returns only a null pointer when no exact match is found. Insertion will require that you then do the equivalent of what bsearch() just did but you'll need to "roll your own." Rolling your own anyway, what good was bsearch()?

bsearch() itself is only a tiny bit of C source code, although it allows the same generality as qsort(). But you don't need that; start with a VERY trivial template and derive your own code for any particular table.

At least ANSI was smart enough to reject BSD's hsearch() which has a horrid interface.

Related to a sorted table for binary search (which uses time log_2(N) for look-up ) is a digital search tree (e.g. trie) where log_5(N) look-up time suffices if each node has fan-out of 5.

AND one of the most elegant (and amazing?) data structures I have ever seen (and which is NOT found in Knuth) is a memory-saving method to store a digital search tree in a hash table. (If anyone gets this far and is seized by excitement in anticipation of the possible amazement, I may share some details via PM.)

steve_bank · Aug 1, 2024

After some more reading I am not done with random numbers yet.

I posted one of the tests, the Chi Squared test. In a uniform random sequence the number of occurrences of each number are counted ad compared against theoretical counts.

That does not tell anything about how the numbers are distributed in the sequence. The occurrences of each number may be the same or close, but the sequence may not be random.

The sequential test looks at occurrences of sub sequences, looking for data clumping.

At 200 million samples a sequence of up to 7 numbers can be tested for a uniform distribution 0-10.

Above that run time is a problem on my PC.

Both the c++ mt19937 and the URAND function have max ints 0x7fffffff. To test over the full integer range at once would be difficult.

The full test would require calculating the theoretical probabilities of a sub sequence and do a Chi Squared test. I just compared URAND to C++ mt19937.

So far URAND algorithm seems to be a good replacement for rand() if you are using C. With C++ the random functions are preferred.

There is one more test, the spectral te

See Knuth Vol 2 Semi Numeral Algorithms chapter 3, empirical tests section.. PDFs online.

Code:

#include <chrono>
#include <random>

void sequential_test(void){
    cout<<"sequential testt"<<endl;
    unsigned seed = chrono::system_clock::now().time_since_epoch().count();
    mt19937 rand_gen(seed);
     uniform_int_distribution<int> dist (0,10);;
    unsigned long long rand_num = seed;
    unsigned long long urand_max = 0x7fffffff;
    unsigned long long m = 0x000fffffffffffff;
    unsigned long long mh = m/2;
    unsigned long long k1 = 2*ceil((mh)*(.5 - sqrt(3)/6.)) + 1;
    unsigned long long k2 = 8*ceil((mh)*atan(1.)/8.) + 5;

   int j,i, n = 200000000,urandcnt = 0,cppcnt = 0;
   int nseq = 3;
   int seq[8] = {0,1,2,3,4,5,6,7};
   int cppflag,urandflag;
   int *cpp = new int[n];
   int *urand = new int[n];

   for(i=0;i<n;i++){
        urand[i] = rand_num%11; //0-10
        rand_num = rand_num*k2 + k1;
        cpp[i] = dist(rand_gen);
    }
    for(i=0;i<n-nseq;i++){
        cppflag = 0;
        urandflag = 0;
        for(j=0;j<nseq;j++){
            if(seq[j] != urand[i+j])urandflag = 1;
            if(seq[j] != cpp[i+j])cppflag = 1;
        }
        if(cppflag == 0)cppcnt++;
        if(urandflag == 0)urandcnt++;
    }
    cout<<"cpp  "<<cppcnt<<endl;
    cout<<"urand  "<<urandcnt<<endl;
}

steve_bank · Aug 6, 2024

Knuth's Shuffle

Fisher–Yates shuffle - Wikipedia

en.wikipedia.org

The Fisher–Yates shuffle is named after Ronald Fisher and Frank Yates, who first described it. It is also known as the Knuth shuffle after Donald Knuth.[2] A variant of the Fisher–Yates shuffle, known as Sattolo's algorithm, may be used to generate random cyclic permutations of length n instead of random permutations.

The modern version of the Fisher–Yates shuffle, designed for computer use, was introduced by Richard Durstenfeld in 1964[4] and popularized by Donald E. Knuth in The Art of Computer Programming as "Algorithm P (Shuffling)".[5] Neither Durstenfeld's article nor Knuth's first edition of The Art of Computer Programming acknowledged the work of Fisher and Yates; they may not have been aware of it.[citation needed] Subsequent editions of Knuth's The Art of Computer Programming mention Fisher and Yates' contribution.[6]
The algorithm described by Durstenfeld is more efficient than that given by Fisher and Yates: whereas a naïve computer implementation of Fisher and Yates' method would spend needless time counting the remaining numbers in step 3 above, Durstenfeld's solution is to move the "struck" numbers to the end of the list by swapping them with the last unstruck number at each iteration. This reduces the algorithm's time complexity to O ( n ) compared to O ( n 2 ) for the naïve implementation.[7] This change gives the following algorithm (for a zero-based array).

Outside of new areas like AI it is hard to find a new ways of doing something that is better than what has been done.

Fisher was born in 1913.

Ronald Fisher - Wikipedia

en.wikipedia.org

Code:

void shuffle(int n,int *x){
    //Knuth's Shuffle in place
    int i,r,swap;
    for(i=n;i>=1;i--){
        r = rand()%i;  //rand() not the best choice
        swap = x[i-1];
        x[i-1] = x[r];
        x[r] = swap;
        }
}


int main(){

   int i,n=10;
    int x[n];
    for(i=0;i<n;i++)x[i] =i;
    shuffle(n,x);
    for(i=0;i<n;i++)cout<<i<<"  "<<x[i]<<endl;
    return 0;
}

steve_bank · Aug 16, 2024

The finish of my foray into random numbers. A reference to URAND in Scilab documentation lled me to the URAND paper and then testing random number generators or more generally random sequences of any source.

If an electromagnet phenomena is detected from space seemingly a series of integers is it random?

Three of the test methods are frequency, sequential, and gap tests.

The frequency test counts the number occurrences of each number and a Chi Squared test compares the counts to what they should be for a perfect uniform distribution.

The frequency test provides no information on randomness of the sequences. For example a sequence 000111222...999 will pass the frequency test but is not random.

That is where sequential and gap tests come in. The sequential test as written counts the occurrences of each non overlapping pair. It looks for non random clumping of numbers. It can be extended to three or more groupings.

The gap test evaluates the distance between occurrences of each single number in the sequence, It looks for a non random bias.

Tests are applied to the CPP mt19937 generator. A benchmark for a quick check of other generators. More comprehensive testing requires a lot of insertions and time.

The C rand() function fails the fervency test as iterations increases, but looks ok with the gap and sequential tests, it takes all three tests.

URAND has results similar to mt19937.

Code:

#include <random>
unsigned Seed = time(NULL);
mt19937 gen(Seed);

void int_array_cpp(int n,int lo,int hi,int *y){
   int i;
   uniform_int_distribution<int> dist (lo,hi);
   for(i=0;i<n;i++)y[i] = dist(gen);
}
void frenquency_test(void){
    //ntrials power of 10  num power of 10
    cout<<"Frencuency Test "<<endl;
    int i,m,num =100,ntrials = pow(10,8);
    int *y = new int[ntrials];
    int_array_cpp(ntrials,0,num-1,y);
    int counts[num];
   for(i=0;i<num;i++)counts[i] = 0;
   double X=0;
   /// count occurnces
   for(i=0;i <ntrials;i++)counts[y[i]]++;
   m = ntrials/num;
   // chi squared test
   for(i=0;i<num;i++)X += double(pow(counts[i]-m,2))/m;
   FILE *p = fopen("freq.dat","w");
   for(i=0;i<num;i++)fprintf(p,"%10d\n",counts[i]);
   fclose(p);
   printf("Chi Squared  %.3f \n",X);
   cout<<"ntrials  "<<ntrials<<endl;
   cout<<"m  "<<m<<endl;
}

void sequential_test(void){
   cout<<"sequential test"<<endl;\
   int k,j,i, z,n = 100000000;
   int *yr = new int[n];
   int_array_cpp(n,0,9,yr);
   int counts[100];
   for(i=0;i<100;i++)counts[i] = 0;
   int x[10]  = {0,1,2,3,4,5,6,7,8,9};
    int y[10] = {0,1,2,3,4,5,6,7,8,9};

    for(i=0;i<n-2;i+=2){
        z = 0;
        for(j=0;j<10;j++){
                for(k=0;k<10;k++){
                    if(yr[i] == x[j] && yr[i+1] == y[k])counts[z]++;
                    z++;
                }
        }
    }
    double mx = n/200,X=0;
    for(i=0;i<100;i++)X += double(pow(counts[i]-mx,2))/mx;
    printf("chi squre .05 critical value 123.225  X  %.3f\n",X);
    FILE *p = fopen("seq.dat","w");
    for(i=0;i<100;i++)fprintf(p,"%10d\n",counts[i]);
    fclose(p);
}

void gap_test(void){
    // change num 0 to 9
   cout<<"gap test"<<endl;
   int i,n = 100000000;
   double aver;
   int *y = new int[n];
   int_array_cpp(n,0,9,y);
   int gap0,gap1,gap2,s = 0,k=0;
   int delta,max = -1,min = 100000;
    int num =0;
    for(i=0;i<n;i++)if(y[i] ==num){gap0 = i;break;}
    gap1 = gap0;
    for(i=gap0+1;i<n;i++){
        if(y[i] == num){
            delta = i-gap1;
            s += delta;
            if(delta > max)max = delta;
            if(delta<min)min = delta;
            gap1 = i;
            k++;
            }
    }
    aver = double(s)/double(k);
    printf("num %d  min %d  max  %d aver gap  %.5f\n",num,min,max,aver);
}

int main(){
    gap_test();
    sequential_test();
    frenquency_test();
    return 0;
}

steve_bank · Oct 8, 2024

From what I see on the net this is the way to use C++ random number generators. Making the statements static should prevent reseeding on successive calls.

If you are using C++ the rand() function should not be used.

The mt19937 generator has a virtually infinite period for practical purposes.

Code:

#include <random>

int unif_int(int lo,int hi){
    static mt19937 rand_gen(time(NULL));
    static uniform_int_distribution<int> dist (lo,hi);
    return dist(rand_gen);
}

void unif_int_array(int n,int lo,int hi,int *y){
    int i;
    static mt19937 rand_gen(time(NULL));
    static uniform_int_distribution<int> dist (lo,hi);
    for(i=0;i<n;i++)y[i] = dist(rand_gen);
}

double unif_float(int lo,int hi){
    static mt19937 rand_gen(time(NULL));
    static uniform_real_distribution<double> dist (lo,hi);
    return dist(rand_gen);
}

void unif_float_array(int n,int lo,int hi,double *y){
    int i;
    static mt19937 rand_gen(time(NULL));
    static uniform_real_distribution<double> dist (lo,hi);
    for(i=0;i<n;i++)y[i] = dist(rand_gen);
}



double unif_dec(void){
    //0 <= x <= 0.999999999767169
    static mt19937 rand_gen(time(NULL));
    return generate_canonical<double,1>(rand_gen);
}

void unif_dec_array(int n,double *y){
    // 0 <= x <= 0.999999999767169
    int i;
    static mt19937 rand_gen(time(NULL));
    for(i=0;i<n;i++)
    y[i] =generate_canonical<double,1>(rand_gen);
}

steve_bank · 2024-10-16T22:01:33-0700

The mt19937 Mersenne Twister random number generator coded directly from the link. Thanks to whoever wrote it. I added the function to make it work.

It is based on an FSR feedback shift register.The technique is used in digital hardware to generate arbitrary sequences. A finite state machine or finite automata in commuter science.

The major benefit is the extremely long period, infinite for all practical purposes,

Mersenne Twister - Wikipedia

en.wikipedia.org

The Association Of Computing Machinery is a place to look for algorithms. It gas been around for a long time.

Collected Algorithms of the ACM

ACM Books

dl.acm.org

Code:

#include <stdint.h>

#define n 624
#define m 397
#define w 32
#define r 31
#define UMASK (0xffffffffUL << r)
#define LMASK (0xffffffffUL >> (w-r))
#define a 0x9908b0dfUL
#define u 11
#define s 7
#define t 15
#define l 18
#define b 0x9d2c5680UL
#define c 0xefc60000UL
#define f 1812433253UL

typedef struct
{
    uint32_t state_array[n];         // the array for the state vector
    int state_index;                 // index into state vector array, 0 <= state_index <= n-1   always
} mt_state;

mt_state mts;


void initialize_state(mt_state* state, uint32_t seed)
{
    uint32_t* state_array = &(state->state_array[0]);

    state_array[0] = seed;                          // suggested initial seed = 19650218UL

    for (int i=1; i<n; i++)
    {
        seed = f * (seed ^ (seed >> (w-2))) + i;    // Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier.
        state_array[i] = seed;
    }

    state->state_index = 0;
}

uint32_t random_uint32(mt_state* state)
{
    uint32_t* state_array = &(state->state_array[0]);

    int k = state->state_index;      // point to current state location
                                     // 0 <= state_index <= n-1   always

//  int k = k - n;                   // point to state n iterations before
//  if (k < 0) k += n;               // modulo n circular indexing
                                     // the previous 2 lines actually do nothing
                                     //  for illustration only

    int j = k - (n-1);               // point to state n-1 iterations before
    if (j < 0) j += n;               // modulo n circular indexing

    uint32_t x = (state_array[k] & UMASK) | (state_array[j] & LMASK);

    uint32_t xA = x >> 1;
    if (x & 0x00000001UL) xA ^= a;

    j = k - (n-m);                   // point to state n-m iterations before
    if (j < 0) j += n;               // modulo n circular indexing

    x = state_array[j] ^ xA;         // compute next value in the state
    state_array[k++] = x;            // update new state value

    if (k >= n) k = 0;               // modulo n circular indexing
    state->state_index = k;

    uint32_t y = x ^ (x >> u);       // tempering
             y = y ^ ((y << s) & b);
             y = y ^ ((y << t) & c);
    uint32_t z = y ^ (y >> l);

    return z;
}

int mt19937(void){
    // 0 <= x <=  2,147,483,646
    static int init = 1;
    if(init){
        initialize_state(&mts, time(NULL));
        init = 0;
        }
    return random_uint32(&mts)%0x7fffffff;
}

steve_bank · 2024-10-17T13:53:13-0700

Having nothing better to do I combined the two functions into one and replaced the structure with static variables for a clean function. The constants could be hard coded into the code.

The constants appear to be standard for the 32 bit version.

Code:

int mt19937(void){
    static int init = 1;
    static  uint32_t state_array[n];
    static int state_index = 0;
    if(init){
        unsigned int seed = time(NULL);
        state_array[0] = seed;     // suggested initial seed = 19650218UL
        for (int i=1; i<n; i++){
            seed = f * (seed ^ (seed >> (w-2))) + i;    // Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier.
            state_array[i] = seed;
        }
        init = 0;
    }
    int k = state_index;  // point to current state location
                                     // 0 <= state_index <= n-1   always
//  int k = k - n;              // point to state n iterations before
//  if (k < 0) k += n;        // modulo n circular indexing
                                     // the previous 2 lines actually do nothing
                                     //  for illustration only

    int j = k - (n-1);         // point to state n-1 iterations before
    if (j < 0) j += n;         // modulo n circular indexing
    uint32_t x = (state_array[k] & UMASK) | (state_array[j] & LMASK);
    uint32_t xA = x >> 1;
    if (x & 0x00000001UL) xA ^= a;
    j = k - (n-m);                           // point to state n-m iterations before
    if (j < 0) j += n;                       // modulo n circular indexing
    x = state_array[j] ^ xA;         // compute next value in the state
    state_array[k++] = x;            // update new state value
    if (k >= n) k = 0;                     // modulo n circular indexing
    state_index = k;
    uint32_t y = x ^ (x >> u);       // tempering
             y = y ^ ((y << s) & b);
             y = y ^ ((y << t) & c);
    uint32_t z = y ^ (y >> l);
    return z%0x7fffffff;

}

The Programming Thread

Swammerdami

Squadron Leader

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

Fisher–Yates shuffle - Wikipedia

Ronald Fisher - Wikipedia

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...

Mersenne Twister - Wikipedia

ACM Books

steve_bank

Diabetic retinopathy and poor eyesight. Typos ...