• Welcome to the new Internet Infidels Discussion Board, formerly Talk Freethought.

The Programming Thread

Swami baby,

I posted two alternatives to your RUFF() which I showed to be statistically better uniform distributions >0 < 1.

I don't think the mean value is a good estimator of how it compares to a true uniform distribution.
 
Code:
When writing code for the workplace my functions would look like this. Depending on complexity I would describe it in pseudo code at the top.

I learned structured programming in an 80s Pascal class. Its is the way I coded.  My rule of thumb was if  a function got to be more than two or three screen scrolls I broke it up.

1. Break problem down into manageable functions.
2. Self documenting code, use meaningful variable and function names.
3. No global variables.
4. Don't hard code constants. Use top level defines
5. Limit nested function calls.
6. No GOTOs or jumps, IOW no spaghetti code.
7. Top down execution, no jumping around the main code.


A simple example but that is the idea.  This kid of format was required in places I worked for work code that was under revision control. The days of the lone coder ended in the 90s.

double add(double a, double     b, double c){
    //function to add two double numbers
    // Inputs
    // a double
    // b double
    // outputs
    //return a + b + c
    //Revision history
    // Rev - x/x/x initial release
           //Rev A x/x/x added a third input variable c
    return ( a+b+c);
}
 
Swami baby,

I posted two alternatives to your RUFF() which I showed to be statistically better uniform distributions >0 < 1.

I don't think the mean value is a good estimator of how it compares to a true uniform distribution.

Steve Darling -- This seems unlikely, but I'm happy to be open-minded about it.

I did notice some code by you which purported to do ... something. But I stopped reading when I saw a spurious sqrt inside my ruf() function. If you now post the actual code you ran, including the "bin counting", I will take a look.

Having a mean of 0.5 is a necessary but insufficient condition for the satisfaction.
 
Macros vs. functions? A macro does not have function-call overhead, and it may be optimized with reference to the surrounding code. So a macro can be faster, and the inline feature of C++ is intended to get the performance of a macro while being type-safe and parse-safe.
 
This is getting me back into a work frame of mind.

The three functions all have 0.5 for a mean. Standard deviation is .2886. Median .707 which is 1/sqrt(2). Haven't racked that down as to significance.

I also compared the Scilab uniform distribution and got the same results.

You can plot the cumulative distributions and compare.

The only difference is the that the RUFF() has a huger average deviation across a 10 bin histogram. Ruf2 is lower than RUFF and ruf1 is leas than ruf2. A relative difference. The statistical differences will probably bot affect computations using the numbers.

The C qsort() does notappearto work with numbers less than 1 so I had to wite a sort routine.


Code:
void ruf1(long long n,double *x){
  cout<<"ruf1"<<endl;
  long long i,MASK = (RAND_MAX >> 2);
  for(i=0;i<n;i++)x[i] = double(rand() & MASK | 1) / (double)(MASK+1);
}

void ruf2(long long n,double *x){
    //make lsb always 1
    cout<<"ruf2"<<endl;
    long long i;
    for(i=0;i<n;i++)
      x[i] = double(rand()|1)/double(RAND_MAX+1);
}

void ruf3(long long n,double *x){
    //skip zero random numbers
    cout<<"ruf3"<<endl;
    unsigned int y;
    long long i;
    for(i=0;i<n;i++){
      while(1){
        if(y = rand()) x[i]= double(y)/double(RAND_MAX+1);
      }
    }
}//ruf3()

void cum_dist(long long n,double *x,double *m,double *cd){
     //finds the median and cumulative  of a data set
     //data must be sorted in increasing order
    cout<<"CUM DIST"<<endl;
    long long i;
    int flag = 0.;
    double pc,stot = 0.,s = x[0];
    for(i=0;i<n-1;i++)stot += x[i]; //sum of the distribution
    for(i=0;i<n;i++){
        pc = 100.*s/stot;  //percent of x[i] vs total
        cd[i] = pc;
        if(pc >= 50. && !flag){flag = 1;*m = x[i];};
        s += x[i];
    }
}//median()

void bubble_sort(long long n,double *x){
     // bute dorce sort and slow
    cout<<"SORTING"<<endl;
    int flag = 0;
    long long i;
    double temp = 0. ;   ;
    while(1){
        flag = 0;
        for(i=0;i<n-1;i++){
            if(x[i] > x[i+1]){
                temp = x[i];
                x[i] = x[i+1];
                x[i+1] = temp;
                flag = 1;
            }//if
        }//i
        if(!flag) break;
    }//while

}//bubble_sort()


void mean_std(long long  n,double *x, double *u,double *sigma){
    cout<<"MEAN"<<endl;
    double sum = 0,mean,std ;
    long long i;
    for(i=0;i<n;i++)sum += x[i];
    mean = sum/double(n);
    sum = 0;
    for(i=0;i<n;i++)sum += pow(mean-x[i],2);
    std = sqrt(sum/double(n));
    *u = mean;
    *sigma = std;
}

int main()
{

    srand(1);
    double ave,med,std;
    long long i,N = pow(2,31);
    double *x = new double[N];
    double *cd = new double[N]; //cumulatve distribution
    ruf1(N,x);
    //ruf2(N,x);
    //ruf3(N,x);

    mean_std(N,x,&ave,&std);
    bubble_sort(N,x);
    cum_dist(N,x,&med,cd);
    printf(" mean  %f  std  %f  median %f/n",ave,std,med);

    FILE *ptr = fopen("data.txt","w");
    fprintf(ptr,"%d",N);
    for(i=0;i<N;i++)fprintf(ptr,"%.4f\n",cd[i]);
    fclose(ptr);

return 0;
}
 
Substitute this for bubble_sort() and for 2^31 iterations it takes about 2.5 minutes to run.

for(i=0;i<N;i++)x = x * 1000.;
cout<<"SORTING"<<endl;
qsort(x,N,sizeof(double),comp_doub_ascend);
for(i=0;i<N;i++)x = x / 1000.;

And add the function

int comp_doub_ascend (const void * a, const void * b) {
return ( *(double*)a - *(double*)b );
}
 
This conversation started because I thought it interesting how easy it is to convert a uniform random variate into the variate of a different distribution, e.g. exponential or cauchy. It seemed especially interesting that a 2D gaussian variate is easier to derive than a 1D gaussian variate.

But the conversation quickly mutated into a discussion of Swammi's coding style or lack thereof. That's fine; let me talk about that.

First of all, I use tools. For example, if a.out is a program which produces a list of floating-point numbers, I'll run the pipe (a.out | stats) to show the mean, standard deviation, skew, kurtosis, minimum, maximum and population count of that set of numbers. (I've never had reason to look at kurtosis, but it's easy to compute and provides a little check, since Wikipedia knows the kurtosis of various distributions.) In addition to executables like stats, I link to library routines I've written.

The program stats does NOT show the median -- I didn't want to spend time doing a sort. When I want the median of, say, 1 million numbers I simply type (a.out | sort -n | head -500000 | tail -1)

I wrote stats myself even though there's probably lots of downloadables that are similar. Almost the ONLY programs I run are tools I've written myself and antique Unix programs like sort, head, tail, awk, sed etc. Do I "suffer" from NIH and DIY syndromes? When it comes to software, you betcha! I won't defend this as "appropriate": I program for fun and my personality and preferences do not fit a typical model.

I write C code in three different styles:
  • Fast and dirty. This is code I whip out, often to solve a puzzle, or to analyze an Internet dataset, perhaps relevant to some message-board discussion. Generally I have no intention of ever re-using the code. Although horrid in terms of organization, comments, and choice of identifiers, it does follow Spencer's One True Style, if only out of habit. Sometimes I do end up re-using some of this horrid code, and am then reminded of why horrid code is horrible! :cool: Oh well. This will not make my Top 100 List of Life's Biggest Regrets.
  • Perfectionistic. This is for the code I deliver to paying customers, or have taken enough pride in to invite others to download. It is also written in Spencer's One True Style. Unfortunately even this code suffers to some extent from minimal comments and overly-short uninformative identifier names. Much of the time spent "perfecting" the code, perfects it in ways that appeal to me but are irrelevant to a paying customer. Fortunately, I am fast and reliable enough to more than please any paying customer despite such "wasted" effort.
  • Snippets for posting. I have posted several snippets in this thread, even before the present conversation. Draw your own conclusions. This code generally conforms to One True Style. Obviously the "one-liners" to convert random variates are an exception since I wanted to stress their alleged one-linedness.

My very first programing language was Fortran in 1965. I programmed in a few other "high-level" languages from the Age of Dinosaurs. And in at least a dozen different machine or assembly languages. I liked machine languages, and sometimes programmed in them even when the usual reasons for doing so were missing. When I discovered C it was love at first sight: It combined what I loved about machine languages and the expressive power of higher-level. I am not interested in other modern languages, especially C++ which has a philosophy OPPOSITE to C's, even though the syntax is identical. I was especially taken aback to learn Python uses indentation for block boundaries -- I think C's approach to white space is perfect and only perverts would want something different. Does php really object to certain spaces in COMMENTS? I can only shake my head in dismay.

I DO write Javascript code out of necessity -- I do some nifty things with it at my website if I do say so myself.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Getting down to the specifics of this conversation, I have my own library to generate and manipulate random numbers. (The actual PRNGs themselves are taken from the public-domain source of experts. I do not try to reinvent EVERY wheel.

The one-liner I posted showing the conversion of a random integer to a random double is NOT taken from my personal library; it was just a throw-away snippet for here. Instead of using 31 random bits, it uses 29 random bits (or 2 less than however many bits random() supplies.). I did this to minimize the risk that someone here would compile the snippets and run up against a limit. My actual library functions are perfectionistic and allow the caller to optionally specify how many random bits to spend when forming a random float.

Let's be clear on the effect of discarding those two random bits. When you call my ad-hoc ruf() to get a random float you might get, by chance
.0186264459
or you might get
.0186264496
But you will never get a number in between those two. Please. Look at those two numbers again. The ad hoc ruf() that I presented above cannot return a number larger than .0186264459 but smaller than .0186264496. The granularity imposed by the ">>2" is just too coarse. (Obviously it's the exact same granularity throughout the 0 < x < 1 range. No cherry-picking here.)

Look at the two numbers again. Raise your hand if you think this coarse granularity is likely to be a problem in any of 99.999% of applications.

This is getting me back into a work frame of mind.

Is that good or bad? :cool: Color me perverse, but I write code for fun when others might be out hang-gliding or trying to pick up women in a bar! (My life isn't completely devoted to code -- I'd probably be out chatting with friends on Floating Fortune Road if I hadn't stayed home to compose this post.)

The three functions all have 0.5 for a mean. Standard deviation is .2886. Median .707 which is 1/sqrt(2). Haven't racked that down as to significance.

The "discrepancy" you complain of is caused STRICTLY by the granularity "problem" I explain above. To show this discrepancy you infoked ruf() 2 Billion times (Billion with a B). You must have understood this in advance, or you wouldn't have chosen such a large iteration count. Did you think this demonstration was needed to understand the result of discarding 2 bits?

And by the way, the medians are all 0.5.

The C qsort() does notappearto work with numbers less than 1 so I had to wite a sort routine.

qsort() works fine with doubles, large or small. You must supply a comparison function. If you wish, post your attempt and I'll debug it for you.
 
C macros are dangerously bug-prone. Consider:

#define ADDONE(x) x+1

The expression
2*ADDONE(x)
is Intended to be 2*(x+1) but it expands into 2*x+1

So what one has to do is

#define ADDONE(x) (x+1)

Parens are also a good way to avoid having to remember operator-precedence rules, and C has a *lot* of them.


C++ has a way to avoid that problem. In fact, many C++ features are ways of avoiding using the preprocessor and avoiding doing so in safe ways.

template<typename T>
inline T& ADDONE(const T &x) {return x + 1;}

The keyword "inline" means that the content of the function should be inserted at where the function was called, just like the preprocessor macro. But this expression is much safer. 2*ADDONE(x) will be interpreted as 2*(x+1)

The & is for calling by reference, as a way of transmitting data without using a pointer.

The const indicates that the code ought not to change the value of x. Trivial for a numeric type, not so trivial for many knids of objects. The purpose of this is to try to catch such a mismatch at compile time.
Just a note, using reference to return a value to local variable like above is a potential hard-to-find bug, and the compiler will warn you about it (which is why it's usually a good idea to make sure you get zero warnings with -Werror).

Consider what the following function does:

Code:
template<typename T>
inline T& ADDONE(const T &x) {return x + 1;}

It takes a reference to x, increases it by 1 and puts that to stack, and then returns a reference to stack... but the variable in the stack gets popped when the function returns so the reference points to invalid data that will likely get overwritten by other garbage soon. In this sense, references are like pointers, you can misuse them if you don't consider the lifetime of the data they point to.

So the right way would be to return a value. Unless you want to return the same reference to x that was passed to the function, but in this case, then it can't be modified because it's a const reference.

Usually, const references are used to avoid overhead when passing larger structures. For integers it hardly makes a difference, since what's passed to the function "under the hood" is likely a pointer, i.e. 64 bits. (A good mental model for references in C++ is that they're just pointers that can't be NULL and can't be made to point to other things after initialization.)
 
Not really discrepancies, variations. The term is pseudo random numbers. True random numbers are generated from a physical process. You can get devices that generate random numbers from electrical noise and cosmic radiation..

My point is your overly complex function is no better than a more common form. You made a pont of declaring ruf() mean is EXACTLY 0.5, well so are the other functions I posted.

As an engineer the pragmatic question was always does it do what is needed not how perfect it is.

x = double(i|1)/double(RAND_MAX+2);

This generates the set
1/RAND_MAX,2/RAND_MAX,3/RAND_MAX...(RAND_MAX + 1)/(RAND_MAX+2).

Having noting better to do, numbers from RUF and the above from the bottom and top of the set.

n--ruf()---above code
1 0.0001220703 0.0000305166
2 0.0003662109 0.0000915499
3 0.0003662109 0.0000915499
4 0.0006103516 0.0001525832
5 0.0006103516 0.0001525832
6 0.0008544922 0.0002136165
7 0.0008544922 0.0002136165
8 0.0010986328 0.0002746498
9 0.0010986328 0.0002746498
10 0.0013427734 0.0003356831
11 0.0013427734 0.0003356831
12 0.0015869141 0.0003967164
13 0.0015869141 0.0003967164
14 0.0018310547 0.0004577497
15 0.0018310547 0.0004577497
16 0.0020751953 0.0005187830
17 0.0020751953 0.0005187830
18 0.0023193359 0.0005798163
19 0.0023193359 0.0005798163
20 0.0025634766 0.0006408496
21 0.0025634766 0.0006408496
22 0.0028076172 0.0007018829
23 0.0028076172 0.0007018829
24 0.0030517578 0.0007629162
25 0.0030517578 0.0007629162
26 0.0032958984 0.0008239495
27 0.0032958984 0.0008239495
28 0.0035400391 0.0008849828
29 0.0035400391 0.0008849828
30 0.0037841797 0.0009460161
31 0.0037841797 0.0009460161
32 0.0040283203 0.0010070493
33 0.0040283203 0.0010070493
34 0.0042724609 0.0010680826
35 0.0042724609 0.0010680826
36 0.0045166016 0.0011291159
37 0.0045166016 0.0011291159
38 0.0047607422 0.0011901492
39 0.0047607422 0.0011901492
40 0.0050048828 0.0012511825
41 0.0050048828 0.0012511825
42 0.0052490234 0.0013122158
43 0.0052490234 0.0013122158
44 0.0054931641 0.0013732491
45 0.0054931641 0.0013732491
46 0.0057373047 0.0014342824
47 0.0057373047 0.0014342824
48 0.0059814453 0.0014953157
49 0.0059814453 0.0014953157
50 0.0062255859 0.0015563490
51 0.0062255859 0.0015563490
52 0.0064697266 0.0016173823
53 0.0064697266 0.0016173823
54 0.0067138672 0.0016784156
55 0.0067138672 0.0016784156
56 0.0069580078 0.0017394489
57 0.0069580078 0.0017394489
58 0.0072021484 0.0018004822
59 0.0072021484 0.0018004822
60 0.0074462891 0.0018615155
61 0.0074462891 0.0018615155
62 0.0076904297 0.0019225488
63 0.0076904297 0.0019225488
64 0.0079345703 0.0019835820
65 0.0079345703 0.0019835820
66 0.0081787109 0.0020446153
67 0.0081787109 0.0020446153
68 0.0084228516 0.0021056486
69 0.0084228516 0.0021056486
70 0.0086669922 0.0021666819
71 0.0086669922 0.0021666819
72 0.0089111328 0.0022277152
73 0.0089111328 0.0022277152
74 0.0091552734 0.0022887485
75 0.0091552734 0.0022887485
76 0.0093994141 0.0023497818
77 0.0093994141 0.0023497818
78 0.0096435547 0.0024108151
79 0.0096435547 0.0024108151
80 0.0098876953 0.0024718484
81 0.0098876953 0.0024718484
82 0.0101318359 0.0025328817
83 0.0101318359 0.0025328817
84 0.0103759766 0.0025939150
85 0.0103759766 0.0025939150
86 0.0106201172 0.0026549483
87 0.0106201172 0.0026549483
88 0.0108642578 0.0027159816
89 0.0108642578 0.0027159816
90 0.0111083984 0.0027770149
91 0.0111083984 0.0027770149
92 0.0113525391 0.0028380482
93 0.0113525391 0.0028380482
94 0.0115966797 0.0028990814
95 0.0115966797 0.0028990814
96 0.0118408203 0.0029601147
97 0.0118408203 0.0029601147
98 0.0120849609 0.0030211480
99 0.0120849609 0.0030211480
100 0.0123291016 0.0030821813
101 0.0123291016 0.0030821813
102 0.0125732422 0.0031432146

32703 0.9920654297 0.9979859013
32704 0.9923095703 0.9980469346
32705 0.9923095703 0.9980469346
32706 0.9925537109 0.9981079679
32707 0.9925537109 0.9981079679
32708 0.9927978516 0.9981690012
32709 0.9927978516 0.9981690012
32710 0.9930419922 0.9982300345
32711 0.9930419922 0.9982300345
32712 0.9932861328 0.9982910678
32713 0.9932861328 0.9982910678
32714 0.9935302734 0.9983521011
32715 0.9935302734 0.9983521011
32716 0.9937744141 0.9984131344
32717 0.9937744141 0.9984131344
32718 0.9940185547 0.9984741677
32719 0.9940185547 0.9984741677
32720 0.9942626953 0.9985352010
32721 0.9942626953 0.9985352010
32722 0.9945068359 0.9985962342
32723 0.9945068359 0.9985962342
32724 0.9947509766 0.9986572675
32725 0.9947509766 0.9986572675
32726 0.9949951172 0.9987183008
32727 0.9949951172 0.9987183008
32728 0.9952392578 0.9987793341
32729 0.9952392578 0.9987793341
32730 0.9954833984 0.9988403674
32731 0.9954833984 0.9988403674
32732 0.9957275391 0.9989014007
32733 0.9957275391 0.9989014007
32734 0.9959716797 0.9989624340
32735 0.9959716797 0.9989624340
32736 0.9962158203 0.9990234673
32737 0.9962158203 0.9990234673
32738 0.9964599609 0.9990845006
32739 0.9964599609 0.9990845006
32740 0.9967041016 0.9991455339
32741 0.9967041016 0.9991455339
32742 0.9969482422 0.9992065672
32743 0.9969482422 0.9992065672
32744 0.9971923828 0.9992676005
32745 0.9971923828 0.9992676005
32746 0.9974365234 0.9993286338
32747 0.9974365234 0.9993286338
32748 0.9976806641 0.9993896671
32749 0.9976806641 0.9993896671
32750 0.9979248047 0.9994507004
32751 0.9979248047 0.9994507004
32752 0.9981689453 0.9995117337
32753 0.9981689453 0.9995117337
32754 0.9984130859 0.9995727669
32755 0.9984130859 0.9995727669
32756 0.9986572266 0.9996338002
32757 0.9986572266 0.9996338002
32758 0.9989013672 0.9996948335
32759 0.9989013672 0.9996948335
32760 0.9991455078 0.9997558668
32761 0.9991455078 0.9997558668
32762 0.9993896484 0.9998169001
32763 0.9993896484 0.9998169001
32764 0.9996337891 0.9998779334
32765 0.9996337891 0.9998779334
32766 0.9998779297 0.9999389667
 
Swami, you begged and pleaded for somebody to respond to your posts, ad threatened to never post in the thread again if nobody responded.

A straightforward comparison. Generate the data sets over the range of rand() by substituting i in a loop for the output of rand(). Find the mean and standard deviation. Calculate the parentage of counts into bins .1 wide.

As can be seen at this 10 bin coarseness there does not appear to be a practical difference. Narrower bins may show a difference.


Code:
void ruf1(long long n,double *x){
  cout<<"ruf1"<<endl;
  long long i,MASK = (RAND_MAX >> 2);
  for(i=0;i<n;i++)x[i] = double(i & MASK | 1) / (double)(MASK+1);
}

void ruf2(long long n,double *x){
    //make lsb always 1
    cout<<"ruf2"<<endl;
    long long i;
    for(i=0;i<n;i++)
      x[i] = double(i|1)/double(RAND_MAX+2);
}

void mean_std(long long  n,double *x, double *u,double *sigma){
    cout<<"MEAN"<<endl;
    double sum = 0,mean,std =0 ;
    long long i;
    for(i=0;i<n;i++)sum += x[i];
    mean = sum/double(n);
    sum = 0;
    for(i=0;i<n;i++)sum += pow(mean-x[i],2);
    std = sqrt(sum/double(n));
    *u = mean;
    *sigma = std;
}

void eval(long long n,int *cnt,double *pc,double *x){
    long long int i;
    for(i = 0;i<n;i++){
        if(x[i] >  .0 && x[i] < .1)cnt[0] += 1;
        if(x[i] >= .1 && x[i] < .2)cnt[1] += 1;
        if(x[i] >= .2 && x[i] < .3)cnt[2] += 1;
        if(x[i] >= .3 && x[i] < .4)cnt[3] += 1;
        if(x[i] >= .4 && x[i] < .5)cnt[4] += 1;
        if(x[i] >= .5 && x[i] < .6)cnt[5] += 1;
        if(x[i] >= .6 && x[i] < .7)cnt[6] += 1;
        if(x[i] >= .7 && x[i] < .8)cnt[7] += 1;
        if(x[i] >= .8 && x[i] < .9)cnt[8] += 1;
        if(x[i] >= .9 && x[i] < 1.)cnt[9] += 1;
    }
    for(i = 0;i<10;i++)pc[i] = 100.*double(cnt[i])/double(n);
}//eval()

   srand(1);
    double ave,med,std;
    double a1,a2,sd1,sd2,m1,m2;
    long long i,N = RAND_MAX;//pow(2,16);
    double *x1 = new double[N];
    double *cd = new double[N]; //cumulatve distribution
    double *x2 = new double[N];
    ruf1(N,x1);
    ruf2(N,x2);
    //ruf3(N,x);

    mean_std(N,x1,&a1,&sd1);
    mean_std(N,x2,&a2,&sd2);
    printf("%.4f  %.4f  %.4f  %.4f\n",a1,sd1,a2,sd2);

    int *cnt1 = new int[10];
    double *pc1 = new double[10];
    int *cnt2 = new int[10];
    double *pc2 = new double[10];
    for(i=0;i<10;i++){
       cnt1[i] = 0;pc1[i] = 0.;
       cnt2[i] = 0;pc2[i] = 0.;
       }
    eval(N,cnt1,pc1,x1);
    eval(N,cnt2,pc2,x2);
    for(i=0;i<10;i++)
    printf("%d   %2.4f  %d   %2.4f\n",cnt1[i],pc1[i],cnt2[i],pc2[i]);


    FILE *ptr = fopen("data.txt","w");
    //fprintf(ptr,"%d",N);
    for(i=0;i<10;i++)//fprintf(ptr,"%d %.10f  %.10f\n",i,x1[i],x2[i]);
    fprintf(ptr,"%d   %2.4f  %d   %2.4f\n",cnt1[i],pc1[i],cnt2[i],pc2[i]);
    fclose(ptr);

return 0;
}

Results. Mean for both are 0.5 and the stadrd deviations are he same.

RUFF()-----ruf2()
bin count-percent-bin count percent
3280   10.0101  3276    9.9979
3272    9.9857  3278   10.0040
3280   10.0101  3276    9.9979
3272    9.9857  3278   10.0040
3280   10.0101  3276    9.9979
3280   10.0101  3278   10.0040
3272    9.9857  3276    9.9979
3280   10.0101  3278   10.0040
3272    9.9857  3276    9.9979
3279   10.0070  3275    9.9948
 
Correction

1/(RAND_MAX+2),2/(RAND_MAX+2),3/(RAND_MAX+2)...(RAND_MAX + 1)/(RAND_MAX+2).
 
Macros vs. functions? A macro does not have function-call overhead, and it may be optimized with reference to the surrounding code. So a macro can be faster, and the inline feature of C++ is intended to get the performance of a macro while being type-safe and parse-safe.
Personally, I like macros for another reason: when you have a piece of code that is absolutely 100% right, and made of heavy or clever math, and can be named in a sensible way putting it in a big, hairy, scary macro can be more effective for keeping idiot newbies from touching it.

A functional macro that is clearly very fragile and put in a place where other scary and fragile things go is a great way to keep devs away.
 
Macros vs. functions? A macro does not have function-call overhead, and it may be optimized with reference to the surrounding code. So a macro can be faster, and the inline feature of C++ is intended to get the performance of a macro while being type-safe and parse-safe.
Personally, I like macros for another reason: when you have a piece of code that is absolutely 100% right, and made of heavy or clever math, and can be named in a sensible way putting it in a big, hairy, scary macro can be more effective for keeping idiot newbies from touching it.

A functional macro that is clearly very fragile and put in a place where other scary and fragile things go is a great way to keep devs away.
I would have thought it better to have a code review process where a senior dev can stop newbies from making such miatakes and perhaps teach them the right way to do whatever it they've been tasked to do. Then they won't be idiots anymore.

Funny to think that there might be orgs out there who obfuscate source code to make it harder for their own people to read.
 
Here you go Swami. I used Sciab for a long time for data analysis. Write a file in C and read it into Scilab.

The histc function automatically counts within a number of bins.

At 200 bins RUFF) and ruf(2) are viryally identical. RUFF() is flat, and ruf2() is off by a few counts on some bins.

Scilab script
clear
clc
s1 = data.txt"
mclose("all")
[f1,err] = mopen(s1,"rt")
if(err < 0)then disp("file open error ",err);end;
N = mfscanf(1,f1," %d")
y = mfscanf(N,f1," %f %f")
mclose("all")
//mprintf("%f %f \n",y)
h1 = histc(y([1:N],1)200)
h2 = histc(y([1:N],2),200)
mprintf("%d %d\n",h1',h2')
w1 = scf(1)
clf(w1)
subplot(1,2,1)
histplot(10,y([1:N],1))
subplot(1,2,2)
histplot(10,y([1:N],2))

A sample of 200 bins
656 656
656 654
648 656
656 656
656 654
656 656
656 656
656 654
656 656
656 656
656 654
656 656
655 655
 
Macros vs. functions? A macro does not have function-call overhead, and it may be optimized with reference to the surrounding code. So a macro can be faster, and the inline feature of C++ is intended to get the performance of a macro while being type-safe and parse-safe.
Personally, I like macros for another reason: when you have a piece of code that is absolutely 100% right, and made of heavy or clever math, and can be named in a sensible way putting it in a big, hairy, scary macro can be more effective for keeping idiot newbies from touching it.

A functional macro that is clearly very fragile and put in a place where other scary and fragile things go is a great way to keep devs away.
I would have thought it better to have a code review process where a senior dev can stop newbies from making such miatakes and perhaps teach them the right way to do whatever it they've been tasked to do. Then they won't be idiots anymore.

Funny to think that there might be orgs out there who obfuscate source code to make it harder for their own people to read.
Back in the 80s some business coders made it so it was impossible to maintain withut them and held companies hostage.

If they left they'd get a retainer to be on call.
 
Macros vs. functions? A macro does not have function-call overhead, and it may be optimized with reference to the surrounding code. So a macro can be faster, and the inline feature of C++ is intended to get the performance of a macro while being type-safe and parse-safe.
Personally, I like macros for another reason: when you have a piece of code that is absolutely 100% right, and made of heavy or clever math, and can be named in a sensible way putting it in a big, hairy, scary macro can be more effective for keeping idiot newbies from touching it.

A functional macro that is clearly very fragile and put in a place where other scary and fragile things go is a great way to keep devs away.
I would have thought it better to have a code review process where a senior dev can stop newbies from making such miatakes and perhaps teach them the right way to do whatever it they've been tasked to do. Then they won't be idiots anymore.

Funny to think that there might be orgs out there who obfuscate source code to make it harder for their own people to read.
The source code for doing certain things deep in the OS has a certain look to it that reflects this same philosophy.

Oftentimes the macros are assembly code, even.
 
Swami, you begged and pleaded for somebody to respond to your posts, ad threatened to never post in the thread again if nobody responded.

Hi Steeve. (I'll drop the doubled E when you double the M in my name.) Begged AND pleaded? Hendiadys a la Shakespeare! It was more of a promise than a threat. My posts seem to attract anger or nitpicking. (One Infidel found fault with the comma placement in COMMENTS!) and I assume Swammi's absence will bring happiness.. I've continued to post in this thread just to bring closure to a conversation I started.

But sincere thanks for participating. I find it fun to talk about programming and it seems you do too.

I was tempted to drop discussion of your code, but was surprised to notice that two UNRELATED minor nitpicks happen to share the exact same theme! That theme is simply that a right triangle has half the area of a rectangle!

(1) Bubble Sort. You can cut the number of compares in your bubble sort by about a factor of two. After K passes of the outer loop, the final K items in the list are already in their final position. If you take advantage of that and plot a time vs length graphic of the compares, you'll change a square into a triangle!

BTW, how much time did it take the computer to bubble-sort 2 billion items? Seems like quite a task!

(2) Median. The median you derive is a sort of "weighted median", not the usual median. I'm not sure if that was intentional or not. For starters, a normal median satisfies Median[X+1] = Median[X] + 1, but yours does not. Your weighted median is √0.5 for our (0,1) uniform dist, and √2.5 for a (1,2) uniform dist.

Why is your weighted median √0.5 = 0.707? Because starting from the narrow end of a triangle you must go 70.7% of the way to get half the area!

Is that weird or what? :cool: Two unrelated nits, and BOTH connect directly to the fact that a triangle has half the area of a rectangle.
 
My posts seem to attract anger or nitpicking. (One Infidel found fault with the comma placement in COMMENTS!) and I assume Swammi's absence will bring happiness.. I've continued to post in this thread just to bring closure to a conversation I started.
I think nitpicking is better than ignoring posts directed at somebody. I'm used to people nitpicking my posts in some other threads. My comments about your commas were based on this: (post #374).
I've mentioned that I am (or was) an idiot savant, though it is the idiot that may be most apparent......If it weren't for the idiot part of my brain, I might be famous.
I couldn't see any reason for you to be called an "idiot". The issue with your commas was the only thing I could think of so I brought it up as a way of trying to agree with your statement. Sorry if it offended you or whatever.
On the topic of nit picking I remember in high school the physics teacher was talking about the distance a tyre would travel based on the radius - I said to the teacher what about the change in thickness of the tyre based on it wearing down? Now that I think about it it would be negligible.
Though in grade 8 I had an argument with my maths teacher that Pi isn't 22/7, it is *approximately* 22/7.
 
Last edited:
There is an anomaly in both functions RUFF and ruf2,. There are sequential duplicate numbers.

0.004028320312500
0.004028320312500

At first I thought it might be due to finite digital arithmetic. I found it to be the |1 in both functions. It creates duplicate integers.

So the probability of the numbers in the set are different and as such RUF() is not suitable for use. At least I would not use it. You can't have the same number resulting from different outputs of rand().

Steve's the name and debug is my game.....

I changed ruf2() to set 0 random numbers to 1e-15.

The moral of the story boys and girls? Unless you are a god who sees all ends no matter how much theoretical analysis you do, you don't know for sure until you test and characterize software and hardware. There can always be unforeseen consequences.

Tested with this.

void ruf2(long long n,double *x){
cout<<"ruf2"<<endl;
long long i;
unsigned int r;
for(i=0;i<n;i++)
r = rand();
if(i)x = double(i)/double(RAND_MAX+1);
else x = 1e-15;
}



RUFF vs corrected ruf2
0.991577148437500 0.997863769531250
0.991577148437500 0.997894287109375
0.991821289062500 0.997924804687500
0.991821289062500 0.997955322265625
0.992065429687500 0.997985839843750
0.992065429687500 0.998016357421875
0.992309570312500 0.998046875000000
0.992309570312500 0.998077392578125
0.992553710937500 0.998107910156250
0.992553710937500 0.998138427734375
0.992797851562500 0.998168945312500
0.992797851562500 0.998199462890625
0.993041992187500 0.998229980468750
0.993041992187500 0.998260498046875
0.993286132812500 0.998291015625000
0.993286132812500 0.998321533203125
0.993530273437500 0.998352050781250
0.993530273437500 0.998382568359375
0.993774414062500 0.998413085937500
0.993774414062500 0.998443603515625
0.994018554687500 0.998474121093750
0.994018554687500 0.998504638671875
0.994262695312500 0.998535156250000
0.994262695312500 0.998565673828125
0.994506835937500 0.998596191406250
0.994506835937500 0.998626708984375
0.994750976562500 0.998657226562500
0.994750976562500 0.998687744140625
0.994995117187500 0.998718261718750
0.994995117187500 0.998748779296875
0.995239257812500 0.998779296875000
0.995239257812500 0.998809814453125
0.995483398437500 0.998840332031250
0.995483398437500 0.998870849609375
0.995727539062500 0.998901367187500
0.995727539062500 0.998931884765625
0.995971679687500 0.998962402343750
0.995971679687500 0.998992919921875
0.996215820312500 0.999023437500000
0.996215820312500 0.999053955078125
0.996459960937500 0.999084472656250
0.996459960937500 0.999114990234375
0.996704101562500 0.999145507812500
0.996704101562500 0.999176025390625
0.996948242187500 0.999206542968750
0.996948242187500 0.999237060546875
0.997192382812500 0.999267578125000
0.997192382812500 0.999298095703125
0.997436523437500 0.999328613281250
0.997436523437500 0.999359130859375
0.997680664062500 0.999389648437500
0.997680664062500 0.999420166015625
0.997924804687500 0.999450683593750
0.997924804687500 0.999481201171875
0.998168945312500 0.999511718750000
0.998168945312500 0.999542236328125
0.998413085937500 0.999572753906250
0.998413085937500 0.999603271484375
0.998657226562500 0.999633789062500
0.998657226562500 0.999664306640625
0.998901367187500 0.999694824218750
0.998901367187500 0.999725341796875
0.999145507812500 0.999755859375000
0.999145507812500 0.999786376953125
0.999389648437500 0.999816894531250
0.999389648437500 0.999847412109375
0.999633789062500 0.999877929687500
0.999633789062500 0.999908447265625
0.999877929687500 0.999938964843750


I
 
Back
Top Bottom