[racket] generator performance

From: Greg Hendershott (greghendershott at gmail.com)
Date: Tue Sep 18 14:07:04 EDT 2012

p.s. Actually the closest to the simple producer function is this
buffer generator:

;; TO-DO: macro-ize
(define buffer-generator
  (let* ([xs '()]
         [ready? #f]
         [yield (lambda (x) (set! xs (cons x xs)))])
    (lambda ()
      (unless ready?
        ;; --> User code here
        (for ([i (in-range N)])
          (yield i))
        ;; <-- User code here
        (set! ready? #t)
        (set! xs (reverse xs))
        )
      (cond [(empty? xs) (void)]
            [(begin0 (car xs) (set! xs (cdr xs)))]))))

For N = 100000

real-generator:   cpu time: 1369 real time: 1442 gc time: 198
thread-generator: cpu time:  364 real time: 365  gc time:  10
simple-producer:  cpu time:    5 real time:   4  gc time:   0
buffer-generator: cpu time:   11 real time:  11  gc time:   5

Admittedly this is throwing RAM at the problem and greedily running
the whole generator up-front. (Although interestingly it hardly moves
the GC needle; benefit of the generational GC?)

This wouldn't be a plausible choice for a general-purpose generator in
a library. But for some N and some applications, it's not necessarily
a _completely_ stupid space/speed trade-off. If the trade-off were
acceptable, it _might_ be something Patrick you could use as part of a
strategy to do an initial port of a large body of existing Python
code? e.g. a "shim" for version 0.1. Yeah, I know. Just a thought.

On Tue, Sep 18, 2012 at 12:52 PM, Greg Hendershott
<greghendershott at gmail.com> wrote:
>> I agree with others that the cost of capturing continuations is the
>> culprit in this case. The run-time system has support for faster
>> continuations that work in constrained settings (currently used to
>> implement futures), and it might be possible to make those
>> continuations kick in work for a typical generator, but I'm not sure.
>
> IIUC continuations are also the foundation for Racket threads.
>
> Which got me wondering. The following toy/naive implementation of a
> generator using threads, seems about 4 or 5 times faster than the real
> one.
>
> However a simple producer function blows the doors off both.
>
> #lang racket
>
> (define N 100000)
>
> (require racket/generator)
> (define real-generator
>   (generator
>    ()
>    (for ([i (in-range N)])
>      (yield i))))
>
> ;; TO-DO: macro-ize.
> (require racket/async-channel)
> (define thread-generator
>   (let* ([ch (make-channel)]
>          [state 'fresh]
>          [yield (lambda (x) (channel-put ch x))])
>     (lambda ()
>       (cond [(eq? state 'fresh)
>              (set! state 'running)
>              (thread
>               (lambda ()
>                 ;; --> User code here
>                 (for ([i (in-range N)])
>                   (yield i))
>                 ;; <-- User code here
>                 (set! state 'expired)
>                 (channel-put ch (void))))])
>       (cond [(eq? state 'expired) (void)]
>             [else (channel-get ch)]))))
>
> (define simple-producer
>   (let ([n 0])
>   (lambda ()
>     (cond [(< n N) (begin0 n (set! n (add1 n)))]
>           [else (void)]))))
>
> (time
>  (length (for/list ([x (in-producer real-generator void?)])
>            x)))
> => cpu time: 1775 real time: 1813 gc time: 236
> => 100000
>
> (time
>  (length (for/list ([x (in-producer thread-generator void?)])
>            x)))
> => cpu time: 402 real time: 402 gc time: 6
> => 100000
>
> (time
>  (length (for/list ([x (in-producer simple-producer  void?)])
>            x)))
> => cpu time: 7 real time: 8 gc time: 4
> => 100000
>
> I'm sure this is nothing you didn't already realize. It was just
> interesting for me to try.
>
>> Meanwhile, it might be interesting to try implementing a `generator'
>> form that rewrites its body to implement `yield's that are
>> syntactically apparent (after local-expansion), leaving parameter and
>> continuations to handle the general case.
>
> That went waaaay over my head. :)
>
> On Tue, Sep 18, 2012 at 8:28 AM, Matthew Flatt <mflatt at cs.utah.edu> wrote:
>> I don't think the parameter overhead is relevant. In Jay's post, he
>> performs 100k parameter accesses, and it takes 16 milliseconds; we can
>> reasonably extrapolate 160 msec for 1M accesses. Patrick's generator
>> example yields 1M times, and takes 18 seconds --- 100x the parameter
>> overhead.
>>
>> I agree with others that the cost of capturing continuations is the
>> culprit in this case. The run-time system has support for faster
>> continuations that work in constrained settings (currently used to
>> implement futures), and it might be possible to make those
>> continuations kick in work for a typical generator, but I'm not sure.
>>
>> Meanwhile, it might be interesting to try implementing a `generator'
>> form that rewrites its body to implement `yield's that are
>> syntactically apparent (after local-expansion), leaving parameter and
>> continuations to handle the general case.
>>
>> At Tue, 18 Sep 2012 07:07:51 -0400, Greg Hendershott wrote:
>>> This stuff is still over my head; I'm at that "know just enough to be
>>> dangerous" stage. With that caveat:
>>>
>>> Jay's blog post
>>> http://jeapostrophe.github.com/blog/2012/07/25/cont-marks2/ mentions
>>> that parameters can be >10X slower than dynamic-wind.
>>>
>>> In generator.rkt I notice an early version that didn't support
>>> multiple values. The new version does, keeping a "yielder" proc in a
>>> parameter.
>>>
>>> Could a single-valued generator yield be faster enough?  Or am I
>>> misapplying my misunderstanding? :)
>>>
>>> On Sun, Sep 16, 2012 at 11:15 AM, Eli Barzilay <eli at barzilay.org> wrote:
>>> > Two hours ago, Patrick Useldinger wrote:
>>> >> Hello
>>> >>
>>> >> I have a Python program doing some intensive computing and would
>>> >> like to port it to Racket for performance reasons.
>>> >>
>>> >> I use a generator in Python which has a very low overhead. While
>>> >> writing some test programs, I seem to have an substantial overhead
>>> >> on using generators in Racket:
>>> >> [...]
>>> >
>>> > IIRC, Python optimizes generators in some ways when possible so when
>>> > they're used with loops there's no big overhead.  The racket
>>> > generators are implemented with continuations and the cost that comes
>>> > with that is unfortunately pretty high.  If you need performance, your
>>> > best bet is probably to translate things to `for...' loops if
>>> > possible.
>>> >
>>> > --
>>> >           ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>>> >                     http://barzilay.org/                   Maze is Life!
>>> > ____________________
>>> >   Racket Users list:
>>> >   http://lists.racket-lang.org/users
>>> ____________________
>>>   Racket Users list:
>>>   http://lists.racket-lang.org/users

Posted on the users mailing list.