Iteration Performance<br><br>I am updating the statistics operations in the science collection to allow arbitrary sequences and have I a question about performance. Section 11.8, Iteration Performance, has some good guidance.<br>
<br>Consider the following:<br><br>(define (mean data)<br> (let ((mu 0.0))<br> (for ((datum data)<br> (i (in-naturals 1)))<br> (set! mu (+ mu (/ (- datum mu) i))))<br> mu))<br><br>(define (mean-faster data)<br>
(let ((mu 0.0))<br> (cond ((vector? data)<br> (for ((datum (in-vector data))<br> (i (in-naturals 1)))<br> (set! mu (+ mu (/ (- datum mu) i)))))<br> ((list? data)<br> (for ((datum (in-list data))<br>
(i (in-naturals 1)))<br> (set! mu (+ mu (/ (- datum mu) i)))))<br> (else<br> (for ((datum data)<br> (i (in-naturals 1)))<br> (set! mu (+ mu (/ (- datum mu) i))))))<br>
mu))<br><br>Here are some times for lists.<br><br>> (define l (build-list 10000000 (lambda (i) i)))<br>> (time (mean v))<br>cpu time: 5453 real time: 5438 gc time: 391<br>4999999.5<br>> (time (mean-faster v))<br>
cpu time: 3312 real time: 3313 gc time: 188<br>4999999.5<br><br>Here are some times for vectors.<br><br>> (define v (build-vector 10000000 (lambda (i) i)))<br>> (time (mean v))<br>cpu time: 18515 real time: 18625 gc time: 628<br>
4999999.5<br>> (time (mean-faster v))<br>cpu time: 11609 real time: 11672 gc time: 672<br>4999999.5<br><br>(and here are some times for an in-range as a general case.<br><br>> (time (mean (in-range 10000000)))<br>cpu time: 15907 real time: 15953 gc time: 607<br>
4999999.5<br>> (time (mean-faster (in-range 10000000)))<br>cpu time: 18172 real time: 18235 gc time: 781<br>4999999.5<br><br>Doing my own dispatching for vectors, lists, and general sequences is faster for vectors and lists - enough so that I would likely use it. [I suppose I could hide it in a macro.] But, does it make sense for 'for' itself to do such dispatching internally for the common types? That is, generate the equivalent of a cond around the fast code cases (e.g., vectors and lists) with an else for the general case?<br>
<br>I was also surprised at how slow iteration over vectors is. It seems like it would be pretty much as fast as lists - it should just need to stride down the vector. [Unless that isn't possible because the vector could move during a GC or something.]<br>
<br>Sequence Contracts<br><br>And, for the contracts guys, I need to write a sequenceof contract to check the contracts - so, something like (sequenceof real?) for the data here. This isn't a big deal for this case - although I would probably do the same internal dispatching when I write it to. But, there are sequences with side effects, like in-lines, that would not be cool to iterate over in a contract. Also, there are sequences that return multiple values. Would there be a reasonable way to write a contract to accept a sequence of reals that didn't error when given a multi-valued sequence? [I won't even think about infinite sequences.]<br>
<br>Sequence Operations<br><br>Has anyone written any general sequence operations already? So far, sequence-length is the only one I've needed and I just wrote one. It seems like there are some general operations that would be useful (yes, for is very powerful, but not always what is needed). This would be along the same lines as the way dictionaries were abstracted. [I've looked in PLaneT and didn't see anything similar.] Just a thought.<br>
<br>Doug<br><br>