This conversation started because I thought it interesting how easy it is to convert a uniform random variate into the variate of a different distribution, e.g. exponential or cauchy. It seemed especially interesting that a 2D gaussian variate is easier to derive than a 1D gaussian variate.
But the conversation quickly mutated into a discussion of Swammi's coding style or lack thereof. That's fine; let me talk about that.
First of all,
I use tools. For example, if
a.out is a program which produces a list of floating-point numbers, I'll run the pipe
(a.out | stats) to show the mean, standard deviation, skew, kurtosis, minimum, maximum and population count of that set of numbers. (I've never had reason to look at kurtosis, but it's easy to compute and provides a little check, since Wikipedia knows the kurtosis of various distributions.) In addition to executables like
stats, I link to library routines I've written.
The program
stats does NOT show the median -- I didn't want to spend time doing a sort. When I want the median of, say, 1 million numbers I simply type
(a.out | sort -n | head -500000 | tail -1)
I wrote
stats myself even though there's probably lots of downloadables that are similar. Almost the ONLY programs I run are tools I've written myself and antique Unix programs like sort, head, tail, awk, sed etc. Do I "suffer" from NIH and DIY syndromes? When it comes to software, you betcha! I won't defend this as "appropriate": I program for fun and my personality and preferences do not fit a typical model.
I write C code in three different styles:
- Fast and dirty. This is code I whip out, often to solve a puzzle, or to analyze an Internet dataset, perhaps relevant to some message-board discussion. Generally I have no intention of ever re-using the code. Although horrid in terms of organization, comments, and choice of identifiers, it does follow Spencer's One True Style, if only out of habit. Sometimes I do end up re-using some of this horrid code, and am then reminded of why horrid code is horrible! Oh well. This will not make my Top 100 List of Life's Biggest Regrets.
- Perfectionistic. This is for the code I deliver to paying customers, or have taken enough pride in to invite others to download. It is also written in Spencer's One True Style. Unfortunately even this code suffers to some extent from minimal comments and overly-short uninformative identifier names. Much of the time spent "perfecting" the code, perfects it in ways that appeal to me but are irrelevant to a paying customer. Fortunately, I am fast and reliable enough to more than please any paying customer despite such "wasted" effort.
- Snippets for posting. I have posted several snippets in this thread, even before the present conversation. Draw your own conclusions. This code generally conforms to One True Style. Obviously the "one-liners" to convert random variates are an exception since I wanted to stress their alleged one-linedness.
My very first programing language was Fortran in 1965. I programmed in a few other "high-level" languages from the Age of Dinosaurs. And in at least a dozen different machine or assembly languages. I liked machine languages, and sometimes programmed in them even when the usual reasons for doing so were missing. When I discovered
C it was love at first sight: It combined what I loved about machine languages and the expressive power of higher-level. I am not interested in other modern languages, especially C++ which has a philosophy OPPOSITE to C's, even though the syntax is identical. I was especially taken aback to learn Python uses indentation for block boundaries -- I think C's approach to white space is perfect and only perverts would want something different. Does
php really object to certain spaces in COMMENTS? I can only shake my head in dismay.
I DO write Javascript code out of necessity -- I do some nifty things with it at my website if I do say so myself.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Getting down to the specifics of this conversation, I have my own library to generate and manipulate random numbers. (The actual PRNGs themselves are taken from the public-domain source of experts. I do not try to reinvent EVERY wheel.
The one-liner I posted showing the conversion of a random integer to a random double is NOT taken from my personal library; it was just a throw-away snippet for here. Instead of using 31 random bits, it uses 29 random bits (or 2 less than however many bits random() supplies.).
I did this to minimize the risk that someone here would compile the snippets and run up against a limit. My actual library functions are perfectionistic and allow the caller to optionally specify
how many random bits to spend when forming a random float.
Let's be clear on the effect of discarding those two random bits. When you call my ad-hoc ruf() to get a random float you might get, by chance
.0186264459
or you might get
.0186264496
But you will never get a number in between those two. Please. Look at those two numbers again. The ad hoc ruf() that I presented above cannot return a number larger than .0186264459 but smaller than .0186264496. The granularity imposed by the ">>2" is just too coarse. (Obviously it's the exact same granularity throughout the 0 < x < 1 range. No cherry-picking here.)
Look at the two numbers again. Raise your hand if you think this coarse granularity is likely to be a problem in any of 99.999% of applications.
This is getting me back into a work frame of mind.
Is that good or bad?
Color me perverse, but I write code for fun when others might be out hang-gliding or trying to pick up women in a bar! (My life isn't completely devoted to code -- I'd probably be out chatting with friends on Floating Fortune Road if I hadn't stayed home to compose this post.)
The three functions all have 0.5 for a mean. Standard deviation is .2886. Median .707 which is 1/sqrt(2). Haven't racked that down as to significance.
The "discrepancy" you complain of is caused STRICTLY by the granularity "problem" I explain above. To show this discrepancy you infoked ruf() 2 Billion times (Billion with a B). You must have understood this in advance, or you wouldn't have chosen such a large iteration count. Did you think this demonstration was needed to understand the result of discarding 2 bits?
And by the way, the medians are all 0.5.
The C qsort() does notappearto work with numbers less than 1 so I had to wite a sort routine.
qsort() works fine with doubles, large or small. You must supply a comparison function. If you wish, post your attempt and I'll debug it for you.