[racket-dev] [plt] Push #27909: master branch updated
An update on my math-centered tests. The first is the built-in
`magnitude' vs. a TR near-transliteration of its C implementation,
called from *untyped* Racket. The numbers are in milliseconds, for 5
million calls, by input type:
Function Flonum Rational Fixnum Integer Float-Complex
-------------------------------------------------------------------
Pre-static contracts:
magnitude* 385 419 378 414 686
magnitude 59 44 40 40 390
Post-static contracts:
magnitude* 175 196 164 195 475
magnitude 68 43 36 40 406
All but the Float-Complex case just return the absolute value of the
argument, so there's not much computation. They're dominated by contract
checking, and the speedup is 0.45 to 0.5. Nice. The Float-Complex case
does something substantive, and is close to C. That's awesome.
So for really small functions (i.e. just a test and a call to `abs'),
writing them in C is still better. But for anything much larger (that
part of `magnitude*' is about 10 lines and 7-13 numeric ops), a TR
implementation isn't much slower than C (about 17%).
--
1 million iterations of some math library flonum functions, in TR,
untyped before Robby's changes, untyped after Robby's 12/12 changes, and
untyped after Eric's static contract changes, in milliseconds:
Function TR Untyped Robby's Robby's+Eric's
--------------------------------------------------------------
flrational? 5 322 98 37
flsinh 55 343 121 86
fllog1p 47 351 117 80
lg+ 61 384 154 115
flgamma 165 521 262 234
So calling TR floating-point functions from untyped Racket takes only
0.11 to 0.45 times the amount of time it took only a month ago, with
typical cases being about 0.25. That's awesome.
Neil ⊥
On 01/14/2014 10:27 AM, Eric Dobson wrote:
> The changes to TR contract generation are now in at HEAD.
>
> If you can find any cases where contracts are still slowing programs
> down by a lot I'd like to take a look at them. (Excepting the case of
> structs where I know it is still an issue).
>
> On Thu, Dec 12, 2013 at 10:52 AM, Robby Findler
> <robby at eecs.northwestern.edu> wrote:
>> Re-reading your message I see that you're not actually asserting something
>> different from what I said, but just for some precision here I wish to point
>> out that I wasn't basing my opinion on intuition from the code, but on some
>> microbenchmark timings. (There was a much more substantial difference
>> yesterday because the loop inside any-wrap/c wasn't as cheap as it could
>> have been.)
>>
>> I'd be interested to see if your improvements to type->contract improve the
>> situation any! I expect they will make things better again for the Number
>> case, but at the moment, there isn't a big difference.
>>
>> Program 1:
>>
>>
>> #lang racket/base
>> (module m typed/racket/base
>> (: f (Any -> Any))
>> (define (f x) 1)
>> (provide f))
>> (require 'm)
>> (time
>> (for ([x (in-range 20000)])
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>
>> Timings:
>>
>> cpu time: 142 real time: 142 gc time: 8
>> cpu time: 144 real time: 144 gc time: 7
>> cpu time: 144 real time: 143 gc time: 6
>> cpu time: 142 real time: 142 gc time: 6
>> cpu time: 142 real time: 142 gc time: 7
>> cpu time: 146 real time: 146 gc time: 6
>>
>> Program 2:
>>
>>
>> #lang racket/base
>> (module m typed/racket/base
>> (: f (Any -> Integer))
>>
>> (define (f x) 1)
>> (provide f))
>> (require 'm)
>> (time
>> (for ([x (in-range 20000)])
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>> (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>
>>
>> Timings:
>>
>> cpu time: 139 real time: 138 gc time: 7
>> cpu time: 145 real time: 144 gc time: 7
>> cpu time: 140 real time: 140 gc time: 6
>> cpu time: 151 real time: 150 gc time: 6
>> cpu time: 139 real time: 138 gc time: 6
>> cpu time: 139 real time: 139 gc time: 8
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 12, 2013 at 12:33 PM, Eric Dobson <eric.n.dobson at gmail.com>
>> wrote:
>>>
>>> any-wrap/c still requires the check for one value, while any (which is
>>> from Number not Any) does not. So I would still guess at Number being
>>> faster, but Robby's changes may make it so that inlining and dead code
>>> elimination can see through everything and turn it into the same code.
>>>
>>> On Thu, Dec 12, 2013 at 10:27 AM, Robby Findler
>>> <robby at eecs.northwestern.edu> wrote:
>>>> FWIW, my push speeds up the any-wrap/c implementation a bunch. Those two
>>>> should have similar speeds after you get that, I guess.
>>>>
>>>> Robby
>>>>
>>>>
>>>> On Thu, Dec 12, 2013 at 11:03 AM, Neil Toronto <neil.toronto at gmail.com>
>>>> wrote:
>>>>>
>>>>> I tried your branch that implements it and saw about 3.5x speedup for
>>>>> the
>>>>> `magnitude*' test. This is of course without Robby's recent first-order
>>>>> contract changes.
>>>>>
>>>>> (I think it's about 3.5x: I tried with magnitude* : Number -> Any first
>>>>> and got 2400ms on the easy tests. I changed it to magnitude* : Number
>>>>> ->
>>>>> Number and got 690ms. Apparently, for an `Any' return type, an
>>>>> `any-wrap/c'
>>>>> contract is generated instead of nothing. If that's much faster to
>>>>> check
>>>>> than `number?', though, the speedup is even better.)
>>>>>
>>>>> I'd love to see this with Robby's recent changes. Hint? Nudge? Please?
>>>>>
>>>>> I didn't see very much speedup with arrays (about 1.2x). Speed tests on
>>>>> the math library's distribution objects were very interesting, though,
>>>>> and
>>>>> indicate why the arrays might not be much faster. Here's my test
>>>>> program:
>>>>>
>>>>>
>>>>> #lang racket
>>>>>
>>>>> (require math/distributions)
>>>>>
>>>>> (define d (normal-dist 0 1))
>>>>>
>>>>> (printf "pdf d 0~n")
>>>>> (for ([_ (in-range 5)])
>>>>> (time (for ([_ (in-range 100000)])
>>>>> (pdf d 0))))
>>>>> (newline)
>>>>>
>>>>> (define p (distribution-pdf d))
>>>>> (printf "p 0~n")
>>>>> (for ([_ (in-range 5)])
>>>>> (time (for ([_ (in-range 100000)])
>>>>> (p 0))))
>>>>> (newline)
>>>>>
>>>>>
>>>>> The two tests are equivalent, as `pdf' just pulls the pdf function out
>>>>> of
>>>>> the distribution struct and applies it. In TR, the tests are exactly
>>>>> the
>>>>> same speed (extremely fast). In untyped Racket, on the main branch, the
>>>>> second test is 16x faster, and on your branch, it's 44x faster. (It's
>>>>> still
>>>>> 10x slower than TR on your branch, so again... I'd love to see your
>>>>> changes
>>>>> and Robby's together. :D)
>>>>>
>>>>> Neil ⊥
>>>>>
>>>>>
>>>>> On 12/12/2013 12:40 AM, Eric Dobson wrote:
>>>>>>
>>>>>> Removing the return value checking is in the works. It actually is
>>>>>> removing all of the checks that would blame typed code, so higher
>>>>>> order functions/datastructure get improvements too. It is actually
>>>>>> functional the last time I checked, but lacking documentation which is
>>>>>> what is holding up merging with mainline.
>>>>>>
>>>>>> https://github.com/plt/racket/pull/453
>>>>>>
>>>>>> On Wed, Dec 11, 2013 at 7:57 PM, Robby Findler
>>>>>> <robby at eecs.northwestern.edu> wrote:
>>>>>>>
>>>>>>> I see that TR's type->contract returns
>>>>>>>
>>>>>>> (-> (flat-named-contract (quote Float) flonum?)
>>>>>>> (flat-named-contract
>>>>>>> (quote
>>>>>>> Float) flonum?))
>>>>>>>
>>>>>>> for the type (Float -> Float), but it could return
>>>>>>>
>>>>>>> (-> (flat-named-contract (quote Float) flonum?) any)
>>>>>>>
>>>>>>> which wouldn't do any result value checking (this being different
>>>>>>> from
>>>>>>> any/c
>>>>>>> as the range of the arrow contract).
>>>>>>>
>>>>>>> Robby
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto
>>>>>>> <neil.toronto at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/11/2013 02:49 PM, Neil Toronto wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/11/2013 01:55 PM, Stephen Bloch wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:
>>>>>>>>>>>
>>>>>>>>>>>> numeric primitives implemented in Typed Racket are faster than
>>>>>>>>>>>> the
>>>>>>>>>>>> same primitives implemented in C.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Whoa! How did that happen?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whoa! That's not what I meant! O_o
>>>>>>>>>
>>>>>>>>> I said "we might be getting close" to that. I haven't tried porting
>>>>>>>>> a
>>>>>>>>> numeric C primitive to TR yet, but I have a hunch that it'll still
>>>>>>>>> be
>>>>>>>>> slower. I'll try one now and report what I find.
>>>>>>>>>
>>>>>>>>> Neil ⊥
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I can't figure out why `flsinh' is faster to call from untyped
>>>>>>>> Racket
>>>>>>>> than
>>>>>>>> `sinh'. All my tests with a Typed Racket `magnitude' show calls from
>>>>>>>> untyped
>>>>>>>> code are significantly slower, except in the one case that it
>>>>>>>> computes
>>>>>>>> Euclidean distance. That case is only twice as slow.
>>>>>>>>
>>>>>>>> I've attached the benchmark program. The `magnitude*' function is
>>>>>>>> more
>>>>>>>> or
>>>>>>>> less a direct translation of `magnitude' from "number.c" into Typed
>>>>>>>> Racket.
>>>>>>>> Here's a summary of the results I get on my computer, in
>>>>>>>> milliseconds,
>>>>>>>> for 5
>>>>>>>> million calls from untyped Racket, by data type.
>>>>>>>>
>>>>>>>>
>>>>>>>> Function Flonum Rational Fixnum Integer Float-Complex
>>>>>>>> -------------------------------------------------------------------
>>>>>>>> magnitude* 385 419 378 414 686
>>>>>>>> magnitude 59 44 40 40 390
>>>>>>>>
>>>>>>>>
>>>>>>>> The only one that's close in relative terms is Float-Complex. The
>>>>>>>> others
>>>>>>>> just call `abs'. The decompiled code doesn't show any inlining of
>>>>>>>> `magnitude', so this comparison should be good.
>>>>>>>>
>>>>>>>> I'll bet checking the return value contract (which is unnecessary)
>>>>>>>> is
>>>>>>>> the
>>>>>>>> main slowdown. It has to check for number of values.
>>>>>>>>
>>>>>>>> For comparison, here are the timings for running the benchmarks in
>>>>>>>> TR
>>>>>>>> with
>>>>>>>> #:no-optimize:
>>>>>>>>
>>>>>>>>
>>>>>>>> Function Flonum Rational Fixnum Integer Float-Complex
>>>>>>>> -------------------------------------------------------------------
>>>>>>>> magnitude* 45 70* 37 102* 318
>>>>>>>> magnitude 61 45 39 91* 394
>>>>>>>>
>>>>>>>> * = unexpectedly high
>>>>>>>>
>>>>>>>>
>>>>>>>> Here's what I understand from comparing the numbers:
>>>>>>>>
>>>>>>>> * Except for non-fixnum integers, calling `magnitude' in TR is
>>>>>>>> just
>>>>>>>> as
>>>>>>>> fast as in untyped Racket. I have no idea why it would be slower on
>>>>>>>> big
>>>>>>>> integers. That's just weird.
>>>>>>>>
>>>>>>>> * Calling `abs' in Racket is faster than calling `scheme_abs' in
>>>>>>>> C,
>>>>>>>> except on rationals and big integers.
>>>>>>>>
>>>>>>>> * Operating on flonums in Typed Racket, using generic numeric
>>>>>>>> functions,
>>>>>>>> is faster than doing the same in C.
>>>>>>>>
>>>>>>>> Overall, it looks like the TR code is within the same order of
>>>>>>>> magnitude
>>>>>>>> (pun not intended) as the C code. I would love to try this benchmark
>>>>>>>> with
>>>>>>>> either 1) a `magnitude*' with an `AnyValues' return type; or 2) a
>>>>>>>> contract
>>>>>>>> boundary that doesn't check TR's return types for first-order
>>>>>>>> functions.
>>>>>>>>
>>>>>>>> (I managed to make a `magnitude*' with type Number -> AnyValues, but
>>>>>>>> TR
>>>>>>>> couldn't make a contract for it.)
>>>>>>>>
>>>>>>>> Neil ⊥
>>>>>>>>
>>>>>>>>
>>>>>>>> _________________________
>>>>>>>> Racket Developers list:
>>>>>>>> http://lists.racket-lang.org/dev
>>>>>>>>
>>>>>>>
>>>>>>> _________________________
>>>>>>> Racket Developers list:
>>>>>>> http://lists.racket-lang.org/dev
>>>>>>>
>>>>>
>>>>