[plt-scheme] Statistics (V301.5 Speed Up)

From: Greg Woodhouse (gregory.woodhouse at sbcglobal.net)
Date: Fri Feb 10 15:03:26 EST 2006

Noise is precisely what statistical analysis is intended to eliminate
(or, rather, it is intended to allow you to identify patterns that
would be obscured by noise). The classical Central Limit Theorem (CLT)
essentially says that if you have a collection {X(1), X(2), ...} of
*independent* random variables sharing the same probability
distribution, then in the limit the sum approaches a normal (Gaussian)
distribution. See

<http://en.wikipedia.org/wiki/Central_limit_theorem#Classical_central_limit_theorem>

Now, this is exactly the situation we are dealing with. If you run the
same program 1000 times, the probability distribution of the running
time (a function) should be the same each time. As you note, there will
be a low priority for very short running times (due to startup cost),
but we're not looking at expectations here, but the actual function
r(n) giving the probability that the run will take about n nanoseconds
(i.e., will fall in a small interval around n). The expectations will
be skewed of course because r(1) << 1 (is much, much less than 100%)
but the *average* running time (over many trials) *will* be normally distributed.

===
Gregory Woodhouse  <gregory.woodhouse at sbcglobal.net>
"All truth passes through three stages: First, it is ridiculed.
Second, it is violently opposed. Third, it is accepted as
being self-evident."
--Arthur Schopenhauer


Posted on the users mailing list.