[plt-scheme] Statistics (V301.5 Speed Up)
--- Jim Blandy <jimb at red-bean.com> wrote:
> As far as beer is concerned: my interpretation of Joe's
> point was that
> we should see a difference in the tails of the
> distributions: while
> the right tail (runs taking more time) should go on
> arbitrarily (maybe
> the machine decides to go off and do a Sudoku while it's
> running your
> benchmark), there should be a point on the left below
> which we see no
> outliers. What do folks think?
Yes, and that's pretty much a Poisson distribution.
Imagine you have 1000 computers. You set them all running
a benchmark. Every 10ms (your timer resolution) you check
to see if any benchmarks have finished. When all have
finished you can plot the results, and I'd expect the them
to follow a Poisson distribution (as the definition of a
Poisson distribution is more or less the model I've
described above). Now note that a Normal distribution is
an excellent approximation to a Poisson when the mean time
to completion is large. See, e.g., Wikipedia:
http://en.wikipedia.org/wiki/Poisson_distribution
So for little micro benchmarks I'd expect a
Poisson/binomial distribution. For larger benchmarks a
Normal distribution is a good enough approximation. How
can we tell if the models match the reality? There are a
number of statistical tests for fit one can do. E.g. the
Kolmogorov-Smirnov test.
For the bimodal results Doug posted, it looks very much
like a mixture of Gaussians to me. That's interesting in
itself -- what is happening to cause this behaviour? The
GC has been hypothesised. It should be possible to verify
this.
This is, at least to me, interesting stuff. If I get time
I'll write more, possibly a workshop publication.
N.
Email: noelwelsh <at> yahoo <dot> com noel <at> untyped <dot> com
AIM: noelhwelsh
Blogs: http://monospaced.blogspot.com/ http://www.untyped.com/untyping/
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com