[racket] Performance of delay/force vs lambda/apply

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Jul 17 02:12:35 EDT 2012

Two days ago, Adolfo Pérez Álvarez wrote:
> Hi everyone,
> 
> While playing around with delay/force I got some unexpected results.
> It seems that the performance of delay/force is far worse than that
> of an equivalent construction based on lambda/apply.
> 
> For example, this implementation of the tak function:
> [...]
> takes around 6 seconds to calculate (g 242501) on my machine:
> > (time (g 242501))
> cpu time: 5963 real time: 6005 gc time: 3980
> 242502

The numbers that I'm getting for this (each is the median of 7 runs)
are:

  delay:      cpu time: 3850 real time: 3874 gc time: 1997
  lazy:       cpu time: 2240 real time: 2253 gc time: 1363
  delay/name: cpu time:  790 real time:  794 gc time:  470
  naive:      cpu time:  945 real time:  951 gc time:  477
  thunks:     cpu time:  392 real time:  393 gc time:  155

Note that the overall speedup for thunks is much bigger -- that's
probably due to better inlining on the recent git head.

The thing is that the default `delay' deals with a bunch of stuff that
makes things slower compared to a naive implementation.  One of them
is dealing with multiple values -- `lazy' doesn't do that which makes
it a bit faster.  This is a substantial performance hit, but the
biggest one which both `delay' and `lazy' suffer from is catching
errors and caching them as a result.  For example, try forcing this a
few times:

    (define foo (let ([c 0]) (delay (begin (set! c (+ c 1))
                                           (/ 1 (- c 1))))))

with `delay' or `lazy' and compare with the naive implementation.

The cost is because each promise has its computation inside a
`with-handlers' (roughly), and I don't know of a good way to make it
faster.  That's not because I didn't try... I would love to have it
run faster, since just that thing is responsible for a 2x slowup --
here are numbers from the same code without the exception catching:

  no-exn-delay: cpu time: 2520 real time: 2532 gc time: 1211
  no-exn-lazy:  cpu time: 1407 real time: 1415 gc time: 789

and in lazy racket this becomes a major issue...


> Simplifying even further the definition of mdelay/mforce:
> 
> (define-syntax-rule (mdelay expr) (thunk expr))
> (define-syntax-rule (mforce expr) (expr))
> 
> got me impressive results (comparable to a C implementation of the
> same function):
> > (time (g 242501))
> cpu time: 597 real time: 602 gc time: 319
> 242502
> 
> Given that the compiler is so good optimizing function creation and
> application, ¿Is there any reason to use delay/force over
> lambda/apply? ¿Am I missing something here?

One thing that will remain consistent is that this kind of
implementation will always be faster since there's enough baggage in
delay/force that it's unlikely to be something that racket could
optimize through.  The simple thunks approach, OTOH, are things can
are likely to be optimized.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.