# [plt-scheme] Statistics (V301.5 Speed Up)

On 2/10/06, Greg Woodhouse <gregory.woodhouse at sbcglobal.net> wrote:
>* Noise is precisely what statistical analysis is intended to eliminate
*>* (or, rather, it is intended to allow you to identify patterns that
*>* would be obscured by noise). The classical Central Limit Theorem (CLT)
*>* essentially says that if you have a collection {X(1), X(2), ...} of
*>* *independent* random variables sharing the same probability
*>* distribution, then in the limit the sum approaches a normal (Gaussian)
*>* distribution. See
*>*
*>* <http://en.wikipedia.org/wiki/Central_limit_theorem#Classical_central_limit_theorem>
*>*
*>* Now, this is exactly the situation we are dealing with.
*
I disagree with this statement.
>* If you run the
*>* same program 1000 times, the probability distribution of the running
*>* time (a function) should be the same each time. As you note, there will
*>* be a low priority for very short running times (due to startup cost),
*>* but we're not looking at expectations here, but the actual function
*>* r(n) giving the probability that the run will take about n nanoseconds
*>* (i.e., will fall in a small interval around n).
*
Suppose we are running some program that sums the integers in the
range [0, 1000). Presumably, there is an `inner loop' that
deterministically executes 1000 times. There is *no* probability
whatsoever that the program will finish faster than the time it takes
that loop to run to completion.
It may be true that summing the probability distributions converges to
a gaussian, but that is irrelevant. I argue that what we are
interested in is the minimum time for running the loop, not the
average time.
>* The expectations will
*>* be skewed of course because r(1) << 1 (is much, much less than 100%)
*>* but the *average* running time (over many trials) *will* be normally distributed.
*
This is a statement that can be easily tested.
--
~jrm