[racket-dev] [plt] Push #27909: master branch updated

From: Neil Toronto (neil.toronto at gmail.com)
Date: Tue Jan 14 13:22:00 EST 2014

An update on my math-centered tests. The first is the built-in 
`magnitude' vs. a TR near-transliteration of its C implementation, 
called from *untyped* Racket. The numbers are in milliseconds, for 5 
million calls, by input type:

Function         Flonum  Rational  Fixnum  Integer  Float-Complex
-------------------------------------------------------------------
Pre-static contracts:
magnitude*         385      419      378     414         686
magnitude           59       44       40      40         390
Post-static contracts:
magnitude*         175      196      164     195         475
magnitude           68       43       36      40         406

All but the Float-Complex case just return the absolute value of the 
argument, so there's not much computation. They're dominated by contract 
checking, and the speedup is 0.45 to 0.5. Nice. The Float-Complex case 
does something substantive, and is close to C. That's awesome.

So for really small functions (i.e. just a test and a call to `abs'), 
writing them in C is still better. But for anything much larger (that 
part of `magnitude*' is about 10 lines and 7-13 numeric ops), a TR 
implementation isn't much slower than C (about 17%).

--

1 million iterations of some math library flonum functions, in TR, 
untyped before Robby's changes, untyped after Robby's 12/12 changes, and 
untyped after Eric's static contract changes, in milliseconds:

Function         TR     Untyped    Robby's   Robby's+Eric's
--------------------------------------------------------------
flrational?       5       322        98           37
flsinh           55       343       121           86
fllog1p          47       351       117           80
lg+              61       384       154          115
flgamma         165       521       262          234

So calling TR floating-point functions from untyped Racket takes only 
0.11 to 0.45 times the amount of time it took only a month ago, with 
typical cases being about 0.25. That's awesome.

Neil ⊥

On 01/14/2014 10:27 AM, Eric Dobson wrote:
> The changes to TR contract generation are now in at HEAD.
>
> If you can find any cases where contracts are still slowing programs
> down by a lot I'd like to take a look at them. (Excepting the case of
> structs where I know it is still an issue).
>
> On Thu, Dec 12, 2013 at 10:52 AM, Robby Findler
> <robby at eecs.northwestern.edu> wrote:
>> Re-reading your message I see that you're not actually asserting something
>> different from what I said, but just for some precision here I wish to point
>> out that I wasn't basing my opinion on intuition from the code, but on some
>> microbenchmark timings. (There was a much more substantial difference
>> yesterday because the loop inside any-wrap/c wasn't as cheap as it could
>> have been.)
>>
>> I'd be interested to see if your improvements to type->contract improve the
>> situation any! I expect they will make things better again for the Number
>> case, but at the moment, there isn't a big difference.
>>
>> Program 1:
>>
>>
>> #lang racket/base
>> (module m typed/racket/base
>>    (: f (Any -> Any))
>>    (define (f x) 1)
>>    (provide f))
>> (require 'm)
>> (time
>>   (for ([x (in-range 20000)])
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>
>> Timings:
>>
>> cpu time: 142 real time: 142 gc time: 8
>> cpu time: 144 real time: 144 gc time: 7
>> cpu time: 144 real time: 143 gc time: 6
>> cpu time: 142 real time: 142 gc time: 6
>> cpu time: 142 real time: 142 gc time: 7
>> cpu time: 146 real time: 146 gc time: 6
>>
>> Program 2:
>>
>>
>> #lang racket/base
>> (module m typed/racket/base
>>    (: f (Any -> Integer))
>>
>>    (define (f x) 1)
>>    (provide f))
>> (require 'm)
>> (time
>>   (for ([x (in-range 20000)])
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>
>>
>> Timings:
>>
>> cpu time: 139 real time: 138 gc time: 7
>> cpu time: 145 real time: 144 gc time: 7
>> cpu time: 140 real time: 140 gc time: 6
>> cpu time: 151 real time: 150 gc time: 6
>> cpu time: 139 real time: 138 gc time: 6
>> cpu time: 139 real time: 139 gc time: 8
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 12, 2013 at 12:33 PM, Eric Dobson <eric.n.dobson at gmail.com>
>> wrote:
>>>
>>> any-wrap/c still requires the check for one value, while any (which is
>>> from Number not Any) does not. So I would still guess at Number being
>>> faster, but Robby's changes may make it so that inlining and dead code
>>> elimination can see through everything and turn it into the same code.
>>>
>>> On Thu, Dec 12, 2013 at 10:27 AM, Robby Findler
>>> <robby at eecs.northwestern.edu> wrote:
>>>> FWIW, my push speeds up the any-wrap/c implementation a bunch. Those two
>>>> should have similar speeds after you get that, I guess.
>>>>
>>>> Robby
>>>>
>>>>
>>>> On Thu, Dec 12, 2013 at 11:03 AM, Neil Toronto <neil.toronto at gmail.com>
>>>> wrote:
>>>>>
>>>>> I tried your branch that implements it and saw about 3.5x speedup for
>>>>> the
>>>>> `magnitude*' test. This is of course without Robby's recent first-order
>>>>> contract changes.
>>>>>
>>>>> (I think it's about 3.5x: I tried with magnitude* : Number -> Any first
>>>>> and got 2400ms on the easy tests. I changed it to magnitude* : Number
>>>>> ->
>>>>> Number and got 690ms. Apparently, for an `Any' return type, an
>>>>> `any-wrap/c'
>>>>> contract is generated instead of nothing. If that's much faster to
>>>>> check
>>>>> than `number?', though, the speedup is even better.)
>>>>>
>>>>> I'd love to see this with Robby's recent changes. Hint? Nudge? Please?
>>>>>
>>>>> I didn't see very much speedup with arrays (about 1.2x). Speed tests on
>>>>> the math library's distribution objects were very interesting, though,
>>>>> and
>>>>> indicate why the arrays might not be much faster. Here's my test
>>>>> program:
>>>>>
>>>>>
>>>>> #lang racket
>>>>>
>>>>> (require math/distributions)
>>>>>
>>>>> (define d (normal-dist 0 1))
>>>>>
>>>>> (printf "pdf d 0~n")
>>>>> (for ([_  (in-range 5)])
>>>>>    (time (for ([_  (in-range 100000)])
>>>>>            (pdf d 0))))
>>>>> (newline)
>>>>>
>>>>> (define p (distribution-pdf d))
>>>>> (printf "p 0~n")
>>>>> (for ([_  (in-range 5)])
>>>>>    (time (for ([_  (in-range 100000)])
>>>>>            (p 0))))
>>>>> (newline)
>>>>>
>>>>>
>>>>> The two tests are equivalent, as `pdf' just pulls the pdf function out
>>>>> of
>>>>> the distribution struct and applies it. In TR, the tests are exactly
>>>>> the
>>>>> same speed (extremely fast). In untyped Racket, on the main branch, the
>>>>> second test is 16x faster, and on your branch, it's 44x faster. (It's
>>>>> still
>>>>> 10x slower than TR on your branch, so again... I'd love to see your
>>>>> changes
>>>>> and Robby's together. :D)
>>>>>
>>>>> Neil ⊥
>>>>>
>>>>>
>>>>> On 12/12/2013 12:40 AM, Eric Dobson wrote:
>>>>>>
>>>>>> Removing the return value checking is in the works. It actually is
>>>>>> removing all of the checks that would blame typed code, so higher
>>>>>> order functions/datastructure get improvements too. It is actually
>>>>>> functional the last time I checked, but lacking documentation which is
>>>>>> what is holding up merging with mainline.
>>>>>>
>>>>>> https://github.com/plt/racket/pull/453
>>>>>>
>>>>>> On Wed, Dec 11, 2013 at 7:57 PM, Robby Findler
>>>>>> <robby at eecs.northwestern.edu> wrote:
>>>>>>>
>>>>>>> I see that TR's type->contract returns
>>>>>>>
>>>>>>>    (-> (flat-named-contract (quote Float) flonum?)
>>>>>>> (flat-named-contract
>>>>>>> (quote
>>>>>>> Float) flonum?))
>>>>>>>
>>>>>>> for the type (Float -> Float), but it could return
>>>>>>>
>>>>>>>    (-> (flat-named-contract (quote Float) flonum?) any)
>>>>>>>
>>>>>>> which wouldn't do any result value checking (this being different
>>>>>>> from
>>>>>>> any/c
>>>>>>> as the range of the arrow contract).
>>>>>>>
>>>>>>> Robby
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto
>>>>>>> <neil.toronto at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/11/2013 02:49 PM, Neil Toronto wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/11/2013 01:55 PM, Stephen Bloch wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:
>>>>>>>>>>>
>>>>>>>>>>>> numeric primitives implemented in Typed Racket are faster than
>>>>>>>>>>>> the
>>>>>>>>>>>> same primitives implemented in C.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Whoa!  How did that happen?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whoa! That's not what I meant! O_o
>>>>>>>>>
>>>>>>>>> I said "we might be getting close" to that. I haven't tried porting
>>>>>>>>> a
>>>>>>>>> numeric C primitive to TR yet, but I have a hunch that it'll still
>>>>>>>>> be
>>>>>>>>> slower. I'll try one now and report what I find.
>>>>>>>>>
>>>>>>>>> Neil ⊥
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I can't figure out why `flsinh' is faster to call from untyped
>>>>>>>> Racket
>>>>>>>> than
>>>>>>>> `sinh'. All my tests with a Typed Racket `magnitude' show calls from
>>>>>>>> untyped
>>>>>>>> code are significantly slower, except in the one case that it
>>>>>>>> computes
>>>>>>>> Euclidean distance. That case is only twice as slow.
>>>>>>>>
>>>>>>>> I've attached the benchmark program. The `magnitude*' function is
>>>>>>>> more
>>>>>>>> or
>>>>>>>> less a direct translation of `magnitude' from "number.c" into Typed
>>>>>>>> Racket.
>>>>>>>> Here's a summary of the results I get on my computer, in
>>>>>>>> milliseconds,
>>>>>>>> for 5
>>>>>>>> million calls from untyped Racket, by data type.
>>>>>>>>
>>>>>>>>
>>>>>>>> Function         Flonum  Rational  Fixnum  Integer  Float-Complex
>>>>>>>> -------------------------------------------------------------------
>>>>>>>> magnitude*         385      419      378     414         686
>>>>>>>> magnitude           59       44       40      40         390
>>>>>>>>
>>>>>>>>
>>>>>>>> The only one that's close in relative terms is Float-Complex. The
>>>>>>>> others
>>>>>>>> just call `abs'. The decompiled code doesn't show any inlining of
>>>>>>>> `magnitude', so this comparison should be good.
>>>>>>>>
>>>>>>>> I'll bet checking the return value contract (which is unnecessary)
>>>>>>>> is
>>>>>>>> the
>>>>>>>> main slowdown. It has to check for number of values.
>>>>>>>>
>>>>>>>> For comparison, here are the timings for running the benchmarks in
>>>>>>>> TR
>>>>>>>> with
>>>>>>>> #:no-optimize:
>>>>>>>>
>>>>>>>>
>>>>>>>> Function         Flonum  Rational  Fixnum  Integer  Float-Complex
>>>>>>>> -------------------------------------------------------------------
>>>>>>>> magnitude*          45       70*      37     102*       318
>>>>>>>> magnitude           61       45       39      91*       394
>>>>>>>>
>>>>>>>>                                                 * = unexpectedly high
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's what I understand from comparing the numbers:
>>>>>>>>
>>>>>>>>    * Except for non-fixnum integers, calling `magnitude' in TR is
>>>>>>>> just
>>>>>>>> as
>>>>>>>> fast as in untyped Racket. I have no idea why it would be slower on
>>>>>>>> big
>>>>>>>> integers. That's just weird.
>>>>>>>>
>>>>>>>>    * Calling `abs' in Racket is faster than calling `scheme_abs' in
>>>>>>>> C,
>>>>>>>> except on rationals and big integers.
>>>>>>>>
>>>>>>>>    * Operating on flonums in Typed Racket, using generic numeric
>>>>>>>> functions,
>>>>>>>> is faster than doing the same in C.
>>>>>>>>
>>>>>>>> Overall, it looks like the TR code is within the same order of
>>>>>>>> magnitude
>>>>>>>> (pun not intended) as the C code. I would love to try this benchmark
>>>>>>>> with
>>>>>>>> either 1) a `magnitude*' with an `AnyValues' return type; or 2) a
>>>>>>>> contract
>>>>>>>> boundary that doesn't check TR's return types for first-order
>>>>>>>> functions.
>>>>>>>>
>>>>>>>> (I managed to make a `magnitude*' with type Number -> AnyValues, but
>>>>>>>> TR
>>>>>>>> couldn't make a contract for it.)
>>>>>>>>
>>>>>>>> Neil ⊥
>>>>>>>>
>>>>>>>>
>>>>>>>> _________________________
>>>>>>>>     Racket Developers list:
>>>>>>>>     http://lists.racket-lang.org/dev
>>>>>>>>
>>>>>>>
>>>>>>> _________________________
>>>>>>>     Racket Developers list:
>>>>>>>     http://lists.racket-lang.org/dev
>>>>>>>
>>>>>
>>>>


Posted on the dev mailing list.