[racket-dev] Inline caching (was Re: my '312' this semester, how we compare to others)

From: Matthias Felleisen (matthias at ccs.neu.edu)
Date: Wed May 4 19:28:08 EDT 2011

Wow. Why didn't I think of asking for this before :-) 


On May 4, 2011, at 7:11 PM, Ryan Culpepper wrote:

> On 05/04/2011 01:57 PM, Tony Garnock-Jones wrote:
>> On 2011-05-04 12:04 PM, Matthias Felleisen wrote:
>>> I still believe that the Java implementation (just under 1s without
>>> their 'Google' contract) benefits from typed dispatches.
>> 
>> Maybe it does, but it's almost certain that it is benefiting from inline
>> caching at send sites (i.e. dynamic type information) much more than it
>> will be benefiting from static type information.
>> 
>> A quick-and-dirty comparison of raw send performance on my Mac:
>> 
>> Language Virtual machine Nanoseconds per send
>> ------------------------------------------------------------
>> Java Hotspot 64-bit 1.6.0_24 1.4
>> Smalltalk Cog r2382 21
>> Smalltalk SqueakVM 4.2.4beta1U 122
>> Racket Racket v5.1 ~350
>> 
>> Note that Cog is a JITting VM and SqueakVM is a plain (but very well
>> optimised) interpreter. Both Cog and SqueakVM use a couple of levels of
>> method lookup cache.
>> 
>> A simple experiment I just performed suggests that a monomorphic inline
>> cache hit can reduce the time needed for a send in Racket from 350ns to
>> around 60ns, which is a massive win. I've attached the program I used to
>> measure this, FWIW. (Run it using command-line Racket, not DrRacket: I
>> got some *very* weird timing artifacts out of DrRacket during this
>> experiment!)
>> 
>> The question, then, is: how do we implement MICs or PICs using Racket's
>> macro system? Each send site needs to expand into
>> 
>> - a piece of global state
>> - a test involving that global state
>> - a possible update to that global state
>> 
>> Hypothesising some kind of (let-static) form that introduces a
>> lexically-scoped piece of global state,
> 
> Here's 'let-static':
> 
>  (define-syntax (let-static stx)
>    (syntax-case stx ()
>      [(let-static ([var rhs] ...) . body)
>       (with-syntax ([(gvar ...)
>                      (syntax-local-lift-values-expression
>                       (length (syntax->list #'(var ...)))
>                       #'(values rhs ...))])
>         #'(let-syntax ([var (make-rename-transformer #'gvar)] ...)
>             . body))]))
> 
> > this kind of thing might Just
>> Work to provide a speedup of almost six-fold on almost-monomorphic
>> send-heavy code:
> 
>> (define-syntax cached-send
>>  (syntax-rules ()
>>    ((_ obj msg arg ...)
>>     (let-static ((bc (box #f))
>>              (bm (box #f)))
>>       (let* ((tmp obj)
>>          (cls (object-ref tmp)))
>>     (if (eq? (unbox bc) cls)
>>         ((unbox bm) tmp arg ...)
>>         (let-values (((method _)
>>                           (find-method/who 'send tmp 'msg)))
>>           (set-box! bc cls)
>>           (set-box! bm method)
>>           (method tmp arg ...))))))))
> 
> That code has a space-safety problem: the caches might keep around classes that should be garbage collected. It also has a race condition: there's a period of time when the class cache has been updated but the method cache hasn't.
> 
> The safe-for-space issue can be fixed by using weak boxes. The race condition could be fixed by using one instead of two... but there's no structure that combines a class and a particular method implementation that's strongly held elsewhere. Creating a new pair for the weak box to hold works, but it means that every GC is likely to flush the cache, but maybe that's okay. It also means that every cache miss triggers heap allocation, but I don't know any way around that.
> 
> I've attached an updated benchmark script. Still looks like a win on speed, but slightly less than before.
> 
> Ryan
> <time-method-send-v2.rkt>_________________________________________________
>  For list-related administrative tasks:
>  http://lists.racket-lang.org/listinfo/dev




Posted on the dev mailing list.