[plt-scheme] Statistics for Sequences

From: Doug Williams (m.douglas.williams at gmail.com)
Date: Wed Sep 9 11:37:06 EDT 2009

Thanks for running them for me. I guess it comes down to whether the
flexibility is worth the performance hit. I like the flexibility. In the
past there were times I have had to convert lists to vectors just to compute
statistics on them, which is even less efficient.  I could include the old
ones as vector-mean, vector-variance, etc for people who need/want the
performance.

Doug

On Wed, Sep 9, 2009 at 9:29 AM, Matthew Flatt <mflatt at cs.utah.edu> wrote:

> I don't think the latest changes will affect the performance, since
> unsafe operations are only used for `in-vector' and (sometimes)
> `in-range' when they appear immediately in a `for' right-hand side.
>
> Times on my machine:
>
>  New
>  laptop% mzscheme time-statistics.ss
>  cpu time: 576 real time: 578 gc time: 11
>  cpu time: 450 real time: 451 gc time: 10
>  (that's without `in-vector'; times using `in-vector' are the same)
>
>  Old
>  laptop% mzscheme time-statistics.ss
>  cpu time: 233 real time: 237 gc time: 18
>  cpu time: 196 real time: 198 gc time: 10
>
> At Wed, 9 Sep 2009 09:18:55 -0600, Doug Williams wrote:
> > I've reimplemented the statistics module from the science collection to
> use
> > sequences instead of just vectors. I like the generality better - I can
> use
> > any sequence (e.g., vector or list) - but there is more of performance
> hit
> > than I would have liked. I haven't timed it with the new changes that
> > Matthew just put it. The good news is that there isn't much of a hit for
> > using (variance data) as opposed to (variance (in-vector data)) and there
> > isn't a huge hit for using the contract that ensures that the sequence is
> a
> > sequence of real numbers.
> >
> > I created a 100000 element vector and timed a loop getting the variance
> of
> > the elements 10 times. Note that I create an executable that runs
> compiled
> > code in both cases. [Runs of the sequence code within DrScheme are about
> > twice the times of the compiled code - I assume they run from byte code
> in
> > that case. Runs of the science collection code is about the same in
> DrScheme
> > - I assume they run the compiled code.]
> >
> > Times using sequences [primarily using 'for/fold' for sequencing and
> > referencing]:
> >
> > (variance data) : cpu time: 625 real time: 625 gc time: 32
> > (unchecked-variance data) : cpu time: 531 real time: 531 gc time: 77
> >
> > (variance (in-vector data)) : cpu time: 609 real time: 609 gc time: 16
> > (unchecked-variance (in-vector data)) : cpu time: 485 real time: 484 gc
> > time: 0
> >
> > Times using vectors (current science collection routines) [primarily
> using
> > 'do' for sequencing with 'vector-ref' for referencing]:
> >
> > (variance data) : cpu time: 235 real time: 234 gc time: 16
> > (unchecked-variance data) : cpu time: 187 real time: 188 gc time: 46
> >
> > All of the normal caveats about timing values apply - just because I'm
> > timing a statistics routine doesn't been it's statistically relevant :).
> >
> > I will retime them when there is a nightly build with Matthew's
> performance
> > improvements is available (it seems that 4.2.1.7 from Saturday is the
> > latest) - or I build it on my machine at home. I don't have the
> development
> > tools on my laptop to build from svn.
> >
> > I've attached the files in case anyone wants to look them over. If
> someone
> > could run them against the latest svn, it would be nice. [
> >
> > Comments from anyone that uses these routines from the science collection
> > would be most welcome.
> >
> > Doug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20090909/65139de9/attachment.html>

Posted on the users mailing list.