[racket] Performance of delay/force vs lambda/apply

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Jul 17 02:12:35 EDT 2012

Two days ago, Adolfo Pérez Álvarez wrote:
> Hi everyone,
> While playing around with delay/force I got some unexpected results.
> It seems that the performance of delay/force is far worse than that
> of an equivalent construction based on lambda/apply.
> For example, this implementation of the tak function:
> [...]
> takes around 6 seconds to calculate (g 242501) on my machine:
> > (time (g 242501))
> cpu time: 5963 real time: 6005 gc time: 3980
> 242502

The numbers that I'm getting for this (each is the median of 7 runs)

  delay:      cpu time: 3850 real time: 3874 gc time: 1997
  lazy:       cpu time: 2240 real time: 2253 gc time: 1363
  delay/name: cpu time:  790 real time:  794 gc time:  470
  naive:      cpu time:  945 real time:  951 gc time:  477
  thunks:     cpu time:  392 real time:  393 gc time:  155

Note that the overall speedup for thunks is much bigger -- that's
probably due to better inlining on the recent git head.

The thing is that the default `delay' deals with a bunch of stuff that
makes things slower compared to a naive implementation.  One of them
is dealing with multiple values -- `lazy' doesn't do that which makes
it a bit faster.  This is a substantial performance hit, but the
biggest one which both `delay' and `lazy' suffer from is catching
errors and caching them as a result.  For example, try forcing this a
few times:

    (define foo (let ([c 0]) (delay (begin (set! c (+ c 1))
                                           (/ 1 (- c 1))))))

with `delay' or `lazy' and compare with the naive implementation.

The cost is because each promise has its computation inside a
`with-handlers' (roughly), and I don't know of a good way to make it
faster.  That's not because I didn't try... I would love to have it
run faster, since just that thing is responsible for a 2x slowup --
here are numbers from the same code without the exception catching:

  no-exn-delay: cpu time: 2520 real time: 2532 gc time: 1211
  no-exn-lazy:  cpu time: 1407 real time: 1415 gc time: 789

and in lazy racket this becomes a major issue...

> Simplifying even further the definition of mdelay/mforce:
> (define-syntax-rule (mdelay expr) (thunk expr))
> (define-syntax-rule (mforce expr) (expr))
> got me impressive results (comparable to a C implementation of the
> same function):
> > (time (g 242501))
> cpu time: 597 real time: 602 gc time: 319
> 242502
> Given that the compiler is so good optimizing function creation and
> application, ¿Is there any reason to use delay/force over
> lambda/apply? ¿Am I missing something here?

One thing that will remain consistent is that this kind of
implementation will always be faster since there's enough baggage in
delay/force that it's unlikely to be something that racket could
optimize through.  The simple thunks approach, OTOH, are things can
are likely to be optimized.

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.