[racket] Number-crunching...

From: Sam Tobin-Hochstadt (samth at cs.indiana.edu)
Date: Tue Nov 12 10:55:16 EST 2013

I've realized one big slowdown in your code -- `random` by default
uses a parameter to find the `current-pseudo-random-generator`.  By
removing the parameter lookup from the loop, I cut the time by about
33%.

My new version is at the same Gist url, with some very minor
additional speedups.

Sam

On Tue, Nov 12, 2013 at 9:05 AM, Sam Tobin-Hochstadt
<samth at cs.indiana.edu> wrote:
> I now know why your custom RNG was slow.  Racket fixnums are (a)
> signed and (b) have a width 1 less than the width of the host
> architecture: [1]
>
> -> (fixnum? (- (expt 2 64) 1))
> #f
> -> (fixnum? (- (expt 2 63) 1))
> #f
> -> (fixnum? (- (expt 2 62) 1))
> #t
>
> Therefore, your random number generation calculation is constantly
> overflowing into bignums and allocating.
>
> [1] http://www.cs.utah.edu/plt/snapshots/current/doc/reference/numbers.html?q=fixnum#%28tech._fixnum%29
>
> Sam
>
> On Tue, Nov 12, 2013 at 8:59 AM, Sam Tobin-Hochstadt
> <samth at cs.indiana.edu> wrote:
>> Here's a start: https://gist.github.com/samth/7431213
>>
>> This is a Typed Racket version of your program, converted to use
>> flvectors instead of f64vectors. I haven't taken a look at the code
>> for your random number generator yet, but you could try doing the
>> timing by creating a large array of random numbers ahead of time, and
>> then timing the loop just referencing into that array.
>>
>> Sam
>>
>> On Tue, Nov 12, 2013 at 7:52 AM, William Cushing
>> <william.cushing at gmail.com> wrote:
>>> I'm in the process of benchmarking various systems for their ability to
>>> handle the inner loop of compilations of MCMC from a higher level
>>> probabilistic modeling language (BLOG).
>>>
>>> Attached are examples of the kind of output (instantiations of MCMC) we'd
>>> like to be able to generate given the tried-and-true Bayes Net linking the
>>> probabilities of earthquake, burglary, john calling, and mary calling.
>>> Classic AI fare.
>>>
>>> Predictably, C is fastest.  But Stalin and SBCL also put in a very fine
>>> showing.
>>>
>>> For 50,000,000 iterations on my machine, I get the following semi-serious,
>>> semi-bogus timings for a random assortment of systems (petite chez just for
>>> fun):
>>>
>>> 1 SBCL 2.20
>>> 2 Bigloo 5.25
>>> 3 Chicken 11.49
>>> 4 Stalin 2.15
>>> 5 Racket 16.67
>>> 6 Petite-Chez 30.42
>>> 7 GCC 0.82
>>> 8 Clang 0.92
>>>
>>> There is much about the comparison that isn't fair.  Chiefly, this is
>>> primarily a test of speed of random number generation, but I haven't
>>> controlled for the algorithm being used to generate random numbers.  Stalin
>>> may very well be using GLIBC random number generation, for example.  I know
>>> SBCL uses MT; in fact, the SBCL experts responded to my little encoding by
>>> halving the shown runtime figure via dynamically linking in a new and
>>> improved (SIMD-aware) implementation of MT.  Which is quite impressive, as
>>> that puts it within 30% of my C version...and my C version is "totally
>>> cheating" by employing a simple little LCPRNG.
>>>
>>> Porting the same generator to Racket in the obvious way more than triples
>>> its running time. (!)
>>> (Is it the global var?  Is the conversion to double being compiled as a real
>>> division?)
>>>
>>> In any case, even just sticking with Racket's default PRNG, it seems the
>>> performance of a simple number crunching loop leaves something to be
>>> desired.  I have done what I can to try and shave off time here and there.
>>> Using unsafe ops, and threading most of the state through the parameters of
>>> a tail-recursive loop (thus avoiding set!), I managed to shave off 3 seconds
>>> from my original encoding (so about 15%), which isn't too shabby.  Of
>>> course, for even less effort, Stalin is already 800% times faster.
>>>
>>> I couldn't get typed/racket to type check, and /no-check doesn't seem to
>>> help performance at all (not surprising).  I suspect that getting
>>> typed/racket to be able to process this file might help substantially...but
>>> I don't really understand its type theory enough to help it figure this
>>> benchmark out.
>>>
>>> I though Racket experts might have some suggestions for how to convince
>>> Racket to run a simple number crunching loop without quite so much in the
>>> way of constant-factor slowdown.
>>>
>>> -Will
>>>
>>>
>>> ____________________
>>>   Racket Users list:
>>>   http://lists.racket-lang.org/users
>>>

Posted on the users mailing list.