[plt-scheme] Statistics (V301.5 Speed Up)
Whoops---just sent this to Greg. Sorry for the duplicate copy, Greg.
On 10 Feb 2006, at 3:03 PM, Greg Woodhouse wrote:
> The classical Central Limit Theorem (CLT)
> essentially says that if you have a collection {X(1), X(2), ...} of
> *independent* random variables sharing the same probability
> distribution, then in the limit the sum approaches a normal (Gaussian)
> distribution.
This is true, but the numerical size of "in the limit" for real
distributions is often so large as to be useless. Also, there are some
restrictions on the probability distribution of the X(1), X(2), ...,
which may be relevant in this case (since the distribution of running
times is going to be very unusual---discontinuous, even). Given all
this, the ratio of smallest running times between compiled and
uncompiled code may be a more robust estimator of the true "speedup"
from compiling in the presence of real-world noise than an average.
(I'm not saying it is, but without some supporting data on the
gaussianity of the distribution of the sum of, say, 1000 trials, I
don't think the CLT gets you anywhere in practice.)
Personally, I would be much less suspicious of the ratio of min run
times over 1000 trials than I would be of the ratio of averages, given
that the run time is, in principle, bounded below and unbounded above.
Will