[plt-scheme] V301.5 Speed Up
--- Joe Marshall <jmarshall at alum.mit.edu> wrote:
> I am *deeply* skeptical of statistics. ... it seems to
me
> that if some assertion is to
> be supported by statistics, it ought to be easy to
> understand the
> assertion and the supporting statistics by application of
> good-old logic and common sense.
> ... Whenever I see the phrase `p-value' it sets off
alarm
> bells in my head.
I don't have a good enough knowledge of statistics to make
authorative statements, but hopefully I can illuminate a
few things and not spread too much misinformation.
I think most statistics is straight forward, just it is
often presented badly. For example, the basics of the
t-test are simple: it is the probability that the two
samples you're comparing could have come from the same
normal distribution, modified to account for the amount of
data on hand (the more data, the more certain of results
you can be).
That said, I agree with you. p-values are a crock.
Bayesian statistics is consistent and simple. Frequentist
statistics (which is where all these different t-, F- etc
tests come from) is arbitrary and prone to inconsistent
results. Frequentist statistics is standard, for better or
worse. Other people will argue the opposite to me, but I
think Bayesian statistics is on the rise as computing power
increases.
The notes you'll find from:
http://www.statslab.cam.ac.uk/~rrw1/teaching/index.html
are a good introduction to stats and also illustrate the
inconsistencies of frequentist stats.
> That said, if anyone is still reading, I am curious about
> your
> results. First of all, I don't expect benchmarks to
> follow a
> gaussian: there should be a certain minimum amount of
> time that a
> benchmark takes. The measured time ought to be that
> minimum plus some
> `noise' that comes from things not related to the
> benchmark. The
> noise may be of some interest, but it seems to me that
> the minimum
> running time is the most interesting thing to measure if
> you are
> running a benchmark.
Ok, here's my take:
This is a philosophical issue. Do we want to measure the
best we can expect when the stars align, or what the user
can expect on a normal day? I choose the latter. You
could equally choose the former. It is up to personal
choice.
On Gaussian noise: I would expect, for the reasons Greg
gives, the distribution to be gaussian. One could test for
this. I haven't, because I haven't implemented the code to
do so. However it would be simple to confirm this by eye
by running a 1000 benchmarks and eye balling the data
(using the histogram plotting functions in the science
collection of course).
If the running time was very small then this assumption
might be in danger, as the gaussian is symmetric but
obviously running time can't be less than zero. In this
case a Poisson might be appropriate.
As for discontinuities in the data, I'm not sure how much
of a worry they are. It seems the timer has a resolution
of 10ms on my system, but if the running times are large
enough relative to this resolution it shouldn't matter.
Again, this assumption should be tested.
Cheers,
Noel
Email: noelwelsh <at> yahoo <dot> com noel <at> untyped <dot> com
AIM: noelhwelsh
Blogs: http://monospaced.blogspot.com/ http://www.untyped.com/untyping/
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com