[plt-scheme] Statistics (V301.5 Speed Up)

From: Noel Welsh (noelwelsh at yahoo.com)
Date: Thu Feb 16 04:46:45 EST 2006

--- Jim Blandy <jimb at red-bean.com> wrote:

> As far as beer is concerned: my interpretation of Joe's
> point was that
> we should see a difference in the tails of the
> distributions: while
> the right tail (runs taking more time) should go on
> arbitrarily (maybe
> the machine decides to go off and do a Sudoku while it's
> running your
> benchmark), there should be a point on the left below
> which we see no
> outliers.  What do folks think?

Yes, and that's pretty much a Poisson distribution. 
Imagine you have 1000 computers.  You set them all running
a benchmark.  Every 10ms (your timer resolution) you check
to see if any benchmarks have finished.  When all have
finished you can plot the results, and I'd expect the them
to follow a Poisson distribution (as the definition of a
Poisson distribution is more or less the model I've
described above).  Now note that a Normal distribution is
an excellent approximation to a Poisson when the mean time
to completion is large.  See, e.g., Wikipedia:


So for little micro benchmarks I'd expect a
Poisson/binomial distribution.  For larger benchmarks a
Normal distribution is a good enough approximation.  How
can we tell if the models match the reality?  There are a
number of statistical tests for fit one can do.  E.g. the
Kolmogorov-Smirnov test.  

For the bimodal results Doug posted, it looks very much
like a mixture of Gaussians to me.  That's interesting in
itself -- what is happening to cause this behaviour?  The
GC has been hypothesised.  It should be possible to verify

This is, at least to me, interesting stuff.  If I get time
I'll write more, possibly a workshop publication.


Email: noelwelsh <at> yahoo <dot> com   noel <at> untyped <dot> com
AIM: noelhwelsh
Blogs: http://monospaced.blogspot.com/  http://www.untyped.com/untyping/

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

Posted on the users mailing list.