[plt-scheme] Statistics for Sequences

From: Doug Williams (m.douglas.williams at gmail.com)
Date: Wed Sep 9 11:18:55 EDT 2009

I've reimplemented the statistics module from the science collection to use
sequences instead of just vectors. I like the generality better - I can use
any sequence (e.g., vector or list) - but there is more of performance hit
than I would have liked. I haven't timed it with the new changes that
Matthew just put it. The good news is that there isn't much of a hit for
using (variance data) as opposed to (variance (in-vector data)) and there
isn't a huge hit for using the contract that ensures that the sequence is a
sequence of real numbers.

I created a 100000 element vector and timed a loop getting the variance of
the elements 10 times. Note that I create an executable that runs compiled
code in both cases. [Runs of the sequence code within DrScheme are about
twice the times of the compiled code - I assume they run from byte code in
that case. Runs of the science collection code is about the same in DrScheme
- I assume they run the compiled code.]

Times using sequences [primarily using 'for/fold' for sequencing and
referencing]:

(variance data) : cpu time: 625 real time: 625 gc time: 32
(unchecked-variance data) : cpu time: 531 real time: 531 gc time: 77

(variance (in-vector data)) : cpu time: 609 real time: 609 gc time: 16
(unchecked-variance (in-vector data)) : cpu time: 485 real time: 484 gc
time: 0

Times using vectors (current science collection routines) [primarily using
'do' for sequencing with 'vector-ref' for referencing]:

(variance data) : cpu time: 235 real time: 234 gc time: 16
(unchecked-variance data) : cpu time: 187 real time: 188 gc time: 46

All of the normal caveats about timing values apply - just because I'm
timing a statistics routine doesn't been it's statistically relevant :).

I will retime them when there is a nightly build with Matthew's performance
improvements is available (it seems that 4.2.1.7 from Saturday is the
latest) - or I build it on my machine at home. I don't have the development
tools on my laptop to build from svn.

I've attached the files in case anyone wants to look them over. If someone
could run them against the latest svn, it would be nice. [

Comments from anyone that uses these routines from the science collection
would be most welcome.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20090909/464ec8c5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: statistics.ss
Type: application/octet-stream
Size: 23772 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20090909/464ec8c5/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: statistics-test.ss
Type: application/octet-stream
Size: 5437 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20090909/464ec8c5/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: time-statistics.ss
Type: application/octet-stream
Size: 1176 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20090909/464ec8c5/attachment-0002.obj>

Posted on the users mailing list.