[racket-dev] Contract barrier inefficiencies

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Fri Dec 28 20:04:35 EST 2012

By "selection" I mean applying a selector a struct value. Sorry, a bit too
terse, eh....

As for the result, that looks like some great things for us to put into
benchmarks (or maybe the math library itself should be what we should look
at).

I think that the reason the profiler isn't helping you is because the time
you care about is being spent in the implementation of the primitives and
that's visible only at the C level. If you can run gprof (or Apple's
tools), you could probably see more of what's going on.

Overall, I feel like some of what you're asking has to do with what TR is
doing and some with what's going on inside the procedure chaperones and so
I'm not sure the contract library itself is a place where fixes can happen.
(But I didn't try to look into what exactly gets put into the chaperones by
the contract library to be sure of that -- maybe there is something dumb
happening there.)

Robby

On Fri, Dec 28, 2012 at 12:03 PM, Neil Toronto <neil.toronto at gmail.com>
wrote:
>
> On 12/27/2012 06:21 PM, Robby Findler wrote:
>>
>> One other place that we realized pretty quickly where we were wrong
>> about the boundaries and inefficiencies is struct declarations. It isn't
>> uncommon to put contracts on structs and put them in their own module
>> (scribble does this). Currently the contract system can't tell that a
>> struct contract is actually on a struct and thus avoid checking when
>> doing a selection, but maybe that's somewhere that could help.
>
>
> What's a selection? Is it something I could understand by reading a
certain paper?
>
> Because using arrays for examples is pretty abstract and carries a lot of
baggage, I've made some independent examples that demonstrate the same
inefficiencies.
>
> The first uses Typed Racket to generate contracts. The type ((Vectorof
Index) -> A) corresponds with the type (Array A), and is in fact the type
of an array's procedure in `math/array'.
>
> In the following code, `make-array' makes an unbounded-ish, constant
"array". The `array-touch' function retrieves every element from 0 to n-1,
and throws them away.
>
>
> #lang racket
>
> (module typed-defs typed/racket/base
>   (require racket/fixnum racket/unsafe/ops)
>   (provide make-array array-touch)
>
>   (: make-array (All (A) (A -> ((Vectorof Index) -> A))))
>   (define (make-array e) (lambda (js) e))
>
>   (: array-touch (All (A) (((Vectorof Index) -> A) Index -> Void)))
>   (define (array-touch arr-proc n)
>     (define: js : (Vectorof Index)  (vector 0))
>     (let: loop : Void ([i : Nonnegative-Fixnum  0])
>       (when (i . fx< . n)  ; proves i : Index
>         (unsafe-vector-set! js 0 i)
>         (arr-proc js)
>         (loop (fx+ i 1)))))
>
>   ;; Takes 20ms to call `arr-proc' 2 million times
>   (define arr-proc (make-array 10))
>   (time (for ([_  (in-range 200)])
>           (array-touch arr-proc 10000))))
>
> (require 'typed-defs)
>
> ;; Takes 3300ms
> (define arr-proc (make-array 10))
> (time (for ([_  (in-range 200)])
>         (array-touch arr-proc 10000)))
>
>
> Summing up the numbers:
>  * inside loop: 20ms
>  * outside loop: 3300ms
>
> These numbers correspond with what I get using arrays in untyped code, so
this example probably isolates the main performance problems.
>
> Notice that `array-touch' mutates the indexes vector it passes to
`arr-proc'. The array library also does this in its nested loops over
elements to reduce allocation to one object (the mutated indexes vector).
Still, it's worth checking out what would happen if immutable vectors were
passed instead. If I'm reading "racket/contract/private/vector.rkt"
correctly, this could be faster because immutable vectors are checked
immediately instead of getting wrapped in a chaperone.
>
>   Changing (arr-proc js) to (arr-proc (vector->immutable-vector js)):
>    * inside loop: 100ms
>    * outside loop: 2950ms
>
> The "inside loop" timing is what I expected from my earlier experiments
that motivated using mutable indexes. The "outside loop" timing suggests
the chaperone isn't the biggest performance problem. (Of course, this
example doesn't even look at the chaperoned value.)
>
>   Changing (Vectorof Index) to (Vectorof Any): no change
>
> I think this is because the chaperone checks the contents of `js' only
when `js' is indexed - which never happens in this example.
>
>   Changing (Vectorof Index) to Any:
>    * inside loop: 20ms (no change)
>    * outside loop: 2200ms
>
> This suggests that putting any kind of contract on indexes is a big
performance hit, but not the biggest.
>
> There's not a lot left to investigate here without diving into TR's
internals, so here's an untyped version that uses contracts:
>
>
> #lang racket
>
> (module untyped-defs racket/base
>   (require racket/fixnum racket/unsafe/ops
>            racket/contract (only-in typed/racket/base index?))
>   (provide
>    (contract-out
>     [make-array  (-> any/c (-> (vectorof index?) any/c))]
>     [array-touch  (-> (-> (vectorof index?) any/c) index? void?)]))
>
>   (define (make-array e) (lambda (js) e))
>
>   (define (array-touch arr-proc n)
>     (define js (vector 0))
>     (let loop ([i 0])
>       (when (i . unsafe-fx< . n)
>         (unsafe-vector-set! js 0 i)
>         (arr-proc js)
>         (loop (unsafe-fx+ i 1)))))
>
>   (define arr-proc (make-array 10))
>   (time (for ([_  (in-range 200)])
>           (array-touch arr-proc 10000))))
>
> (require 'untyped-defs)
>
> (define arr-proc (make-array 10))
> (time (for ([_  (in-range 200)])
>         (array-touch arr-proc 10000)))
>
>
> The timings for the loops in this version are identical to those from the
typed version: 20ms inside vs. 3300ms outside. But this is surprising:
>
>   Changing (vectorof index?) to any/c:
>    * inside loop: 20ms
>    * outside loop: 1700ms
>
> Doing the same thing in the typed version reduced the "outside loop" time
to 2200ms, not 1700ms. It must not be the same thing after all. What's
going on?
>
>   Changing the "array" contract to (-> (vectorof index?) any):
>    * inside loop: 20ms
>    * outside loop: 2340ms
>
> This suggests that checking for multiple values is pretty slow.
>
>   Changing the "array" contract to (-> any/c any):
>    * inside loop: 20ms
>    * outside loop: 20ms
>
> As expected; the procedure can be checked without wrapping.
>
> FWIW, I would have used the profiler to do this, but it keeps being wrong
about where time is spent. It's as if the contract wrappers don't exist. Is
there a way to change that?
>
> Neil ⊥
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20121228/cc51f718/attachment-0001.html>

Posted on the dev mailing list.