[plt-scheme] Statistics for Sequences

From: Doug Williams (m.douglas.williams at gmail.com)
Date: Fri Sep 11 12:46:04 EDT 2009

I wrote 'dispatching-for' and 'dispatching-for/fold' macros that generate a
cond that dispatches to for (or fold/fold) with specific vector, list, and
general versions.  For example:

(define (mean data)
  (dispatch-for/fold ((m-old 0.0))
                     ((i (in-naturals))
                      (x data))
    (+ m-old (/ (- x m-old) (add1 i)))))

==>

(define (mean data)
  (cond ((vector? data)
         (for/fold ((m-old 0.0))
                      ((i (in-naturals))
                       (x (in-vector data)))
            (+ m-old (/ (- x m-old) (add1 i)))))
        ((pair? data)
         (for/fold ((m-old 0.0))
                      ((i (in-naturals))
                       (x (in-list data)))
            (+ m-old (/ (- x m-old) (add1 i)))))
        (else
         (for/fold ((m-old 0.0))
                      ((i (in-naturals))
                       (x data))
            (+ m-old (/ (- x m-old) (add1 i)))))))

This runs at about the same speed as the current science collection routines
for vectors. Also, lists run at pretty much the same speed.

Times using 'dispatch-for' and 'dispatch-for/fold':
cpu time: 250 real time: 250 gc time: 0
cpu time: 235 real time: 234 gc time: 0

Times using the current science collection:
cpu time: 250 real time: 250 gc time: 0
cpu time: 203 real time: 203 gc time: 0

The first time in each block is for the contracted version that ensures that
the inputs are a sequence (or vector for the current science collection) of
reals. The second time in each block is for the unchecked version that
bypasses the contract check.

I'm not sure that any special cases other than vectors and lists make sense
to break out separately.

All times were done using version 4.2.1.8-svn10sep2009, which should have
Matthew's performance code.

Doug


On Thu, Sep 10, 2009 at 10:12 AM, Doug Williams <
m.douglas.williams at gmail.com> wrote:

> Maybe the issue of when "just in time" is. The JIT currently compiles
>> at the granularity of `lambda', and it compiles each `lambda' only
>> once.
>>
>
>> So, although `sequence-generate' may produce vector-specific operations
>> for a given invocation of the `for' loop, the JIT produces a generic
>> function call to the vector-specific operation (which might not be
>> vector-specific next time around).
>>
>
> So, the vector-specific lambda(s) returned by sequence-generate are
> optimized when they are first seen by the JIT compiler. But, there are
> additional optimizations (for the sequencing operation) that would not be
> possible within the for itself because there is not much left but generic
> function invocations. Specifically in the case of vectors, by having the for
> macro generate the increment and test of the index and the vector-refs, the
> JIT compiler can generate specific instructions for those operations. While,
> for a generic sequence that happens to be a vector, there is no alternative
> but to generate a call to see if more data is available and, if there is,
> another call to get the next element of the sequence.
>
>
>> One possibility for improvement, then, is to change the JIT strategy to
>> generate code later and more frequently.
>>
>>
> But, writing some kind of 'dispatching for' macro is probably the only
> near-term way of getting nearly the same performance for a generic sequence
> that happens to be a vector and a 'for' specific written for a vector.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20090911/d542323f/attachment.html>

Posted on the users mailing list.