[racket-dev] Contract barrier inefficiencies

From: Neil Toronto (neil.toronto at gmail.com)
Date: Fri Dec 28 13:03:17 EST 2012

On 12/27/2012 06:21 PM, Robby Findler wrote:
> One other place that we realized pretty quickly where we were wrong
> about the boundaries and inefficiencies is struct declarations. It isn't
> uncommon to put contracts on structs and put them in their own module
> (scribble does this). Currently the contract system can't tell that a
> struct contract is actually on a struct and thus avoid checking when
> doing a selection, but maybe that's somewhere that could help.

What's a selection? Is it something I could understand by reading a 
certain paper?

Because using arrays for examples is pretty abstract and carries a lot 
of baggage, I've made some independent examples that demonstrate the 
same inefficiencies.

The first uses Typed Racket to generate contracts. The type ((Vectorof 
Index) -> A) corresponds with the type (Array A), and is in fact the 
type of an array's procedure in `math/array'.

In the following code, `make-array' makes an unbounded-ish, constant 
"array". The `array-touch' function retrieves every element from 0 to 
n-1, and throws them away.


#lang racket

(module typed-defs typed/racket/base
   (require racket/fixnum racket/unsafe/ops)
   (provide make-array array-touch)

   (: make-array (All (A) (A -> ((Vectorof Index) -> A))))
   (define (make-array e) (lambda (js) e))

   (: array-touch (All (A) (((Vectorof Index) -> A) Index -> Void)))
   (define (array-touch arr-proc n)
     (define: js : (Vectorof Index)  (vector 0))
     (let: loop : Void ([i : Nonnegative-Fixnum  0])
       (when (i . fx< . n)  ; proves i : Index
         (unsafe-vector-set! js 0 i)
         (arr-proc js)
         (loop (fx+ i 1)))))

   ;; Takes 20ms to call `arr-proc' 2 million times
   (define arr-proc (make-array 10))
   (time (for ([_  (in-range 200)])
           (array-touch arr-proc 10000))))

(require 'typed-defs)

;; Takes 3300ms
(define arr-proc (make-array 10))
(time (for ([_  (in-range 200)])
         (array-touch arr-proc 10000)))


Summing up the numbers:
  * inside loop: 20ms
  * outside loop: 3300ms

These numbers correspond with what I get using arrays in untyped code, 
so this example probably isolates the main performance problems.

Notice that `array-touch' mutates the indexes vector it passes to 
`arr-proc'. The array library also does this in its nested loops over 
elements to reduce allocation to one object (the mutated indexes 
vector). Still, it's worth checking out what would happen if immutable 
vectors were passed instead. If I'm reading 
"racket/contract/private/vector.rkt" correctly, this could be faster 
because immutable vectors are checked immediately instead of getting 
wrapped in a chaperone.

   Changing (arr-proc js) to (arr-proc (vector->immutable-vector js)):
    * inside loop: 100ms
    * outside loop: 2950ms

The "inside loop" timing is what I expected from my earlier experiments 
that motivated using mutable indexes. The "outside loop" timing suggests 
the chaperone isn't the biggest performance problem. (Of course, this 
example doesn't even look at the chaperoned value.)

   Changing (Vectorof Index) to (Vectorof Any): no change

I think this is because the chaperone checks the contents of `js' only 
when `js' is indexed - which never happens in this example.

   Changing (Vectorof Index) to Any:
    * inside loop: 20ms (no change)
    * outside loop: 2200ms

This suggests that putting any kind of contract on indexes is a big 
performance hit, but not the biggest.

There's not a lot left to investigate here without diving into TR's 
internals, so here's an untyped version that uses contracts:


#lang racket

(module untyped-defs racket/base
   (require racket/fixnum racket/unsafe/ops
            racket/contract (only-in typed/racket/base index?))
   (provide
    (contract-out
     [make-array  (-> any/c (-> (vectorof index?) any/c))]
     [array-touch  (-> (-> (vectorof index?) any/c) index? void?)]))

   (define (make-array e) (lambda (js) e))

   (define (array-touch arr-proc n)
     (define js (vector 0))
     (let loop ([i 0])
       (when (i . unsafe-fx< . n)
         (unsafe-vector-set! js 0 i)
         (arr-proc js)
         (loop (unsafe-fx+ i 1)))))

   (define arr-proc (make-array 10))
   (time (for ([_  (in-range 200)])
           (array-touch arr-proc 10000))))

(require 'untyped-defs)

(define arr-proc (make-array 10))
(time (for ([_  (in-range 200)])
         (array-touch arr-proc 10000)))


The timings for the loops in this version are identical to those from 
the typed version: 20ms inside vs. 3300ms outside. But this is surprising:

   Changing (vectorof index?) to any/c:
    * inside loop: 20ms
    * outside loop: 1700ms

Doing the same thing in the typed version reduced the "outside loop" 
time to 2200ms, not 1700ms. It must not be the same thing after all. 
What's going on?

   Changing the "array" contract to (-> (vectorof index?) any):
    * inside loop: 20ms
    * outside loop: 2340ms

This suggests that checking for multiple values is pretty slow.

   Changing the "array" contract to (-> any/c any):
    * inside loop: 20ms
    * outside loop: 20ms

As expected; the procedure can be checked without wrapping.

FWIW, I would have used the profiler to do this, but it keeps being 
wrong about where time is spent. It's as if the contract wrappers don't 
exist. Is there a way to change that?

Neil ⊥


Posted on the dev mailing list.