[racket] Cost of the contract boundary in the math library
Moving to dev so as to not upset the locals with preliminary results. :D
On 11/03/2012 09:20 AM, Matthias Felleisen wrote:
> Last night Sam, Tony and I had a discussion on TR/R boundaries
> for his "racket on a router" project. Tony ported his software
> from Racket to Typed Racket and stopped halfway in between. The
> 'framework' (aka kernel) is now in TR and the 'user program' aka
> 'client' lives in R. I had predicted that the boundary between the
> two would cause a severe performance and Tony has now confirmed
> this conjecture. (We are talking factors not small percentages.)
>
> As you get racket/math ready for production, I think you too should
> measure the performance hit from going across the boundary.
I've already done that somewhat. First-order functions are okay; all the
new flonum functions, special functions, bigfloats, number theory, etc.,
run at a decent clip on the untyped side of the contract boundary. For
example, I get 4 million `gamma' applications per second in TR, and 1.2
million per second untyped.
(I think the difference for `gamma' is more about how well Vincent and
Matthew have done with TR's optimizer and the JIT. Thanks to their work,
computing gamma comes down to about 100 flops running right on the CPU.)
Higher-order functions, though, are dog slow. In particular, all the
array functions are higher-order, because an array is just a function
with a rectangular domain; e.g. `array-map' is composition. Here's a
program that times computing the elements of an array:
#lang racket
(require math/array)
(define arr
(build-array #(3 3) (λ (js)
(match-define (vector j0 j1) js)
(+ j0 j1))))
arr
(for ([_ (in-range 5)])
(time (for ([_ (in-range 50000)])
(array-strict arr))))
This is the output I get:
(array [[0 1 2] [1 2 3] [2 3 4]])
cpu time: 2680 real time: 2684 gc time: 170
cpu time: 2650 real time: 2659 gc time: 140
cpu time: 2660 real time: 2662 gc time: 170
cpu time: 2650 real time: 2653 gc time: 170
cpu time: 2660 real time: 2660 gc time: 160
Changing the language to "typed/racket", I get this:
(array [[0 1 2] [1 2 3] [2 3 4]])
cpu time: 90 real time: 90 gc time: 20
cpu time: 70 real time: 77 gc time: 0
cpu time: 80 real time: 75 gc time: 10
cpu time: 80 real time: 77 gc time: 10
cpu time: 80 real time: 82 gc time: 20
So here, the contract boundary slows things down 33x.
Huh. I just tried `make-array', which creates the array's function in TR
code, and I get 53x. I didn't expect that. I also get 20x for using
distribution objects, which are immutable structs that contain functions
that are only created in TR code.
> If it
> is bad, we should consider including both a typed and an untyped
> variant where the latter is generated from the former (I believe
> you are working in TR so that's why I wrote the last sentence).
> That is, when the library is installed the Untyped one should be
> generated by disabling types and type checking.
We should consider it, then, unless there's a way to significantly speed
up a type's generated, higher-order contracts.
I'm a bit confused about how this would help, though. The interface
between the library and the user will still have to be contracted, so
where does the performance gain come from?
Neil ⊥