[plt-scheme] Statistics (V301.5 Speed Up)

From: Will M. Farr (farr at MIT.EDU)
Date: Fri Feb 10 15:18:49 EST 2006

Whoops---just sent this to Greg.  Sorry for the duplicate copy, Greg.

On 10 Feb 2006, at 3:03 PM, Greg Woodhouse wrote:

> The classical Central Limit Theorem (CLT)
> essentially says that if you have a collection {X(1), X(2), ...} of
> *independent* random variables sharing the same probability
> distribution, then in the limit the sum approaches a normal (Gaussian)
> distribution.

This is true, but the numerical size of "in the limit" for real 
distributions is often so large as to be useless.  Also, there are some 
restrictions on the probability distribution of the X(1), X(2), ..., 
which may be relevant in this case (since the distribution of running 
times is going to be very unusual---discontinuous, even).  Given all 
this, the ratio of smallest running times between compiled and 
uncompiled code may be a more robust estimator of the true "speedup" 
from compiling in the presence of real-world noise than an average.  
(I'm not saying it is, but without some supporting data on the 
gaussianity of the distribution of the sum of, say, 1000 trials, I 
don't think the CLT gets you anywhere in practice.)

Personally, I would be much less suspicious of the ratio of min run 
times over 1000 trials than I would be of the ratio of averages, given 
that the run time is, in principle, bounded below and unbounded above.

Will



Posted on the users mailing list.