[plt-scheme] V301.5 Speed Up
On 2/10/06, Noel Welsh <noelwelsh at yahoo.com> wrote:
> --- Gregory Woodhouse <gregory.woodhouse at sbcglobal.net>
> wrote:
>
> > It would be nice to be able run a test 1000 times,
> > saving the data for statistical analysis.
>
> I've just written code to do this (run code 50 times,
> perform test for significance). It requires a hacked
> version of the science collection so it won't work till the
> next version of the science collection is out. If anyone
> wants it, email me off list.
>
> Anyway, some observations:
>
> - GC time is really long compared to run time (for the
> silly little benchmarks I tried)
>
> - unexpectedly, the variance of my measurements was
> crazy! When I made benchmarks (just loops adding up
> numbers) long enough to measure the time reliable I got
> results like this:
>
> The code:
>
> (let* ((test1 (lambda ()
> (for ((i 0 10000) (sum 0))
> (+ 1000 sum))))
> (test2 (lambda ()
> (for ((i 0 10000000) (sum 0))
> (+ 1 sum))))
> (s1 (measure test1))
> (s2 (measure test2)))
> (let-values (((faster? p) (faster s1 s2)))
> (printf "p ~a\n" p)
> (printf "s1 mean: ~a var: ~a\n" (mean s1)
> (variance s1))
> (printf "s2 mean: ~a var: ~a\n" (mean s2)
> (variance s2))
> (assert-true faster?)))
>
> The output:
>
> p 1.0
> s1 mean: 6.799999999999996 var: 22.204081632653068
> s2 mean: 13492.2 var: 18376.693877551028
>
> P is the value returned by the t-test (the probability the
> means differ by chance). Incidentally the assumptions for
> the t-test are almost certainly violated in this case.
>
> Anyway, I really can't explain the variance being that
> large. Here's how I collect the data:
>
> ;; measure : ((any ...) -> any any ...) -> (vector-of
> number)
> (define (measure proc . args)
> (define (prepare)
> (for! (i 0 3)
> (collect-garbage)))
> (list->vector
> (for ((i 0 50) (times null))
> (prepare)
> (let-values (((results cpu-time real-time
> gc-time)
> (time-apply proc args)))
> (cons cpu-time times)))))
>
> Hope that's of interest to someone!
I am *deeply* skeptical of statistics. I admit that this is largely
because I am ignorant, but it seems to me that if some assertion is to
be supported by statistics, it ought to be easy to understand the
assertion and the supporting statistics by application of good-old
logic and common sense.
Yes, I'm a bit of a crank, but I think I'm in good company. Harold
Jeffreys (of the Jeffreys-Lindley paradox) points out that `p-values'
can vary wildly even in some rather mundane circumstances. Whenever I
see the phrase `p-value' it sets off alarm bells in my head.
That said, if anyone is still reading, I am curious about your
results. First of all, I don't expect benchmarks to follow a
gaussian: there should be a certain minimum amount of time that a
benchmark takes. The measured time ought to be that minimum plus some
`noise' that comes from things not related to the benchmark. The
noise may be of some interest, but it seems to me that the minimum
running time is the most interesting thing to measure if you are
running a benchmark.
--
~jrm