[plt-scheme] Statistics (V301.5 Speed Up)

From: Joe Marshall (jmarshall at alum.mit.edu)
Date: Fri Feb 10 18:54:55 EST 2006

On 2/10/06, Greg Woodhouse <gregory.woodhouse at sbcglobal.net> wrote:
> Noise is precisely what statistical analysis is intended to eliminate
> (or, rather, it is intended to allow you to identify patterns that
> would be obscured by noise). The classical Central Limit Theorem (CLT)
> essentially says that if you have a collection {X(1), X(2), ...} of
> *independent* random variables sharing the same probability
> distribution, then in the limit the sum approaches a normal (Gaussian)
> distribution. See
> <http://en.wikipedia.org/wiki/Central_limit_theorem#Classical_central_limit_theorem>
> Now, this is exactly the situation we are dealing with.

I disagree with this statement.

> If you run the
> same program 1000 times, the probability distribution of the running
> time (a function) should be the same each time. As you note, there will
> be a low priority for very short running times (due to startup cost),
> but we're not looking at expectations here, but the actual function
> r(n) giving the probability that the run will take about n nanoseconds
> (i.e., will fall in a small interval around n).

Suppose we are running some program that sums the integers in the
range [0, 1000).  Presumably, there is an `inner loop' that
deterministically executes 1000 times.  There is *no* probability
whatsoever that the program will finish faster than the time it takes
that loop to run to completion.

It may be true that summing the probability distributions converges to
a gaussian, but that is irrelevant.  I argue that what we are
interested in is the minimum time for running the loop, not the
average time.

> The expectations will
> be skewed of course because r(1) << 1 (is much, much less than 100%)
> but the *average* running time (over many trials) *will* be normally distributed.

This is a statement that can be easily tested.


Posted on the users mailing list.