[plt-scheme] Statistics for Sequences

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Sep 9 11:29:07 EDT 2009

I don't think the latest changes will affect the performance, since
unsafe operations are only used for `in-vector' and (sometimes)
`in-range' when they appear immediately in a `for' right-hand side.

Times on my machine:

 New
  laptop% mzscheme time-statistics.ss
  cpu time: 576 real time: 578 gc time: 11
  cpu time: 450 real time: 451 gc time: 10
  (that's without `in-vector'; times using `in-vector' are the same)

 Old
  laptop% mzscheme time-statistics.ss
  cpu time: 233 real time: 237 gc time: 18
  cpu time: 196 real time: 198 gc time: 10

At Wed, 9 Sep 2009 09:18:55 -0600, Doug Williams wrote:
> I've reimplemented the statistics module from the science collection to use
> sequences instead of just vectors. I like the generality better - I can use
> any sequence (e.g., vector or list) - but there is more of performance hit
> than I would have liked. I haven't timed it with the new changes that
> Matthew just put it. The good news is that there isn't much of a hit for
> using (variance data) as opposed to (variance (in-vector data)) and there
> isn't a huge hit for using the contract that ensures that the sequence is a
> sequence of real numbers.
> 
> I created a 100000 element vector and timed a loop getting the variance of
> the elements 10 times. Note that I create an executable that runs compiled
> code in both cases. [Runs of the sequence code within DrScheme are about
> twice the times of the compiled code - I assume they run from byte code in
> that case. Runs of the science collection code is about the same in DrScheme
> - I assume they run the compiled code.]
> 
> Times using sequences [primarily using 'for/fold' for sequencing and
> referencing]:
> 
> (variance data) : cpu time: 625 real time: 625 gc time: 32
> (unchecked-variance data) : cpu time: 531 real time: 531 gc time: 77
> 
> (variance (in-vector data)) : cpu time: 609 real time: 609 gc time: 16
> (unchecked-variance (in-vector data)) : cpu time: 485 real time: 484 gc
> time: 0
> 
> Times using vectors (current science collection routines) [primarily using
> 'do' for sequencing with 'vector-ref' for referencing]:
> 
> (variance data) : cpu time: 235 real time: 234 gc time: 16
> (unchecked-variance data) : cpu time: 187 real time: 188 gc time: 46
> 
> All of the normal caveats about timing values apply - just because I'm
> timing a statistics routine doesn't been it's statistically relevant :).
> 
> I will retime them when there is a nightly build with Matthew's performance
> improvements is available (it seems that 4.2.1.7 from Saturday is the
> latest) - or I build it on my machine at home. I don't have the development
> tools on my laptop to build from svn.
> 
> I've attached the files in case anyone wants to look them over. If someone
> could run them against the latest svn, it would be nice. [
> 
> Comments from anyone that uses these routines from the science collection
> would be most welcome.
> 
> Doug



Posted on the users mailing list.