[plt-scheme] Iteration Performance

From: Doug Williams (m.douglas.williams at gmail.com)
Date: Fri Feb 27 14:04:33 EST 2009

Iteration Performance

I am updating the statistics operations in the science collection to allow
arbitrary sequences and have I a question about performance. Section 11.8,
Iteration Performance, has some good guidance.

Consider the following:

(define (mean data)
  (let ((mu 0.0))
    (for ((datum data)
          (i (in-naturals 1)))
      (set! mu (+ mu (/ (- datum mu) i))))
    mu))

(define (mean-faster data)
  (let ((mu 0.0))
    (cond ((vector? data)
           (for ((datum (in-vector data))
                 (i (in-naturals 1)))
             (set! mu (+ mu (/ (- datum mu) i)))))
          ((list? data)
           (for ((datum (in-list data))
                 (i (in-naturals 1)))
             (set! mu (+ mu (/ (- datum mu) i)))))
          (else
           (for ((datum data)
                 (i (in-naturals 1)))
             (set! mu (+ mu (/ (- datum mu) i))))))
    mu))

Here are some times for lists.

> (define l (build-list 10000000 (lambda (i) i)))
> (time (mean v))
cpu time: 5453 real time: 5438 gc time: 391
4999999.5
> (time (mean-faster v))
cpu time: 3312 real time: 3313 gc time: 188
4999999.5

Here are some times for vectors.

> (define v (build-vector 10000000 (lambda (i) i)))
> (time (mean v))
cpu time: 18515 real time: 18625 gc time: 628
4999999.5
> (time (mean-faster v))
cpu time: 11609 real time: 11672 gc time: 672
4999999.5

(and here are some times for an in-range as a general case.

> (time (mean (in-range 10000000)))
cpu time: 15907 real time: 15953 gc time: 607
4999999.5
> (time (mean-faster (in-range 10000000)))
cpu time: 18172 real time: 18235 gc time: 781
4999999.5

Doing my own dispatching for vectors, lists, and general sequences is faster
for vectors and lists - enough so that I would likely use it. [I suppose I
could hide it in a macro.] But, does it make sense for 'for' itself to do
such dispatching internally for the common types? That is, generate the
equivalent of a cond around the fast code cases (e.g., vectors and lists)
with an else for the general case?

I was also surprised at how slow iteration over vectors is. It seems like it
would be pretty much as fast as lists - it should just need to stride down
the vector. [Unless that isn't possible because the vector could move during
a GC or something.]

Sequence Contracts

And, for the contracts guys, I need to write a sequenceof contract to check
the contracts - so, something like (sequenceof real?) for the data here.
This isn't a big deal for this case - although I would probably do the same
internal dispatching when I write it to. But, there are sequences with side
effects, like in-lines, that would not be cool to iterate over in a
contract.  Also, there are sequences that return multiple values. Would
there be a reasonable way to write a contract to accept a sequence of reals
that didn't error when  given a multi-valued sequence? [I won't even think
about infinite sequences.]

Sequence Operations

Has anyone written any general sequence operations already? So far,
sequence-length is the only one I've needed and I just wrote one. It seems
like there are some general operations that would be useful (yes, for is
very powerful, but not always what is needed). This would be along the same
lines as the way dictionaries were abstracted. [I've looked in PLaneT and
didn't see anything similar.] Just a thought.

Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20090227/f6cecfdf/attachment.html>

Posted on the users mailing list.