[racket-dev] Inline caching (was Re: my '312' this semester, how we compare to others)

From: Ryan Culpepper (ryanc at ccs.neu.edu)
Date: Wed May 4 19:11:57 EDT 2011

On 05/04/2011 01:57 PM, Tony Garnock-Jones wrote:
> On 2011-05-04 12:04 PM, Matthias Felleisen wrote:
>> I still believe that the Java implementation (just under 1s without
>> their 'Google' contract) benefits from typed dispatches.
> Maybe it does, but it's almost certain that it is benefiting from inline
> caching at send sites (i.e. dynamic type information) much more than it
> will be benefiting from static type information.
> A quick-and-dirty comparison of raw send performance on my Mac:
> Language Virtual machine Nanoseconds per send
> ------------------------------------------------------------
> Java Hotspot 64-bit 1.6.0_24 1.4
> Smalltalk Cog r2382 21
> Smalltalk SqueakVM 4.2.4beta1U 122
> Racket Racket v5.1 ~350
> Note that Cog is a JITting VM and SqueakVM is a plain (but very well
> optimised) interpreter. Both Cog and SqueakVM use a couple of levels of
> method lookup cache.
> A simple experiment I just performed suggests that a monomorphic inline
> cache hit can reduce the time needed for a send in Racket from 350ns to
> around 60ns, which is a massive win. I've attached the program I used to
> measure this, FWIW. (Run it using command-line Racket, not DrRacket: I
> got some *very* weird timing artifacts out of DrRacket during this
> experiment!)
> The question, then, is: how do we implement MICs or PICs using Racket's
> macro system? Each send site needs to expand into
> - a piece of global state
> - a test involving that global state
> - a possible update to that global state
> Hypothesising some kind of (let-static) form that introduces a
> lexically-scoped piece of global state,

Here's 'let-static':

   (define-syntax (let-static stx)
     (syntax-case stx ()
       [(let-static ([var rhs] ...) . body)
        (with-syntax ([(gvar ...)
                        (length (syntax->list #'(var ...)))
                        #'(values rhs ...))])
          #'(let-syntax ([var (make-rename-transformer #'gvar)] ...)
              . body))]))

 > this kind of thing might Just
> Work to provide a speedup of almost six-fold on almost-monomorphic
> send-heavy code:

> (define-syntax cached-send
>   (syntax-rules ()
>     ((_ obj msg arg ...)
>      (let-static ((bc (box #f))
>               (bm (box #f)))
>        (let* ((tmp obj)
>           (cls (object-ref tmp)))
>      (if (eq? (unbox bc) cls)
>          ((unbox bm) tmp arg ...)
>          (let-values (((method _)
>                            (find-method/who 'send tmp 'msg)))
>            (set-box! bc cls)
>            (set-box! bm method)
>            (method tmp arg ...))))))))

That code has a space-safety problem: the caches might keep around 
classes that should be garbage collected. It also has a race condition: 
there's a period of time when the class cache has been updated but the 
method cache hasn't.

The safe-for-space issue can be fixed by using weak boxes. The race 
condition could be fixed by using one instead of two... but there's no 
structure that combines a class and a particular method implementation 
that's strongly held elsewhere. Creating a new pair for the weak box to 
hold works, but it means that every GC is likely to flush the cache, but 
maybe that's okay. It also means that every cache miss triggers heap 
allocation, but I don't know any way around that.

I've attached an updated benchmark script. Still looks like a win on 
speed, but slightly less than before.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: time-method-send-v2.rkt
URL: <http://lists.racket-lang.org/dev/archive/attachments/20110504/04e18a57/attachment.ksh>

Posted on the dev mailing list.