[racket-dev] [plt] Push #27909: master branch updated

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Tue Jan 14 13:24:45 EST 2014

Do any of these functions turn out to have case-> contracts by the time TR
gets done with them?

Robby



On Tue, Jan 14, 2014 at 12:22 PM, Neil Toronto <neil.toronto at gmail.com>wrote:

> An update on my math-centered tests. The first is the built-in `magnitude'
> vs. a TR near-transliteration of its C implementation, called from
> *untyped* Racket. The numbers are in milliseconds, for 5 million calls, by
> input type:
>
>
> Function         Flonum  Rational  Fixnum  Integer  Float-Complex
> -------------------------------------------------------------------
> Pre-static contracts:
>
> magnitude*         385      419      378     414         686
> magnitude           59       44       40      40         390
> Post-static contracts:
> magnitude*         175      196      164     195         475
> magnitude           68       43       36      40         406
>
> All but the Float-Complex case just return the absolute value of the
> argument, so there's not much computation. They're dominated by contract
> checking, and the speedup is 0.45 to 0.5. Nice. The Float-Complex case does
> something substantive, and is close to C. That's awesome.
>
> So for really small functions (i.e. just a test and a call to `abs'),
> writing them in C is still better. But for anything much larger (that part
> of `magnitude*' is about 10 lines and 7-13 numeric ops), a TR
> implementation isn't much slower than C (about 17%).
>
> --
>
> 1 million iterations of some math library flonum functions, in TR, untyped
> before Robby's changes, untyped after Robby's 12/12 changes, and untyped
> after Eric's static contract changes, in milliseconds:
>
> Function         TR     Untyped    Robby's   Robby's+Eric's
> --------------------------------------------------------------
> flrational?       5       322        98           37
> flsinh           55       343       121           86
> fllog1p          47       351       117           80
> lg+              61       384       154          115
> flgamma         165       521       262          234
>
> So calling TR floating-point functions from untyped Racket takes only 0.11
> to 0.45 times the amount of time it took only a month ago, with typical
> cases being about 0.25. That's awesome.
>
> Neil ⊥
>
>
> On 01/14/2014 10:27 AM, Eric Dobson wrote:
>
>> The changes to TR contract generation are now in at HEAD.
>>
>> If you can find any cases where contracts are still slowing programs
>> down by a lot I'd like to take a look at them. (Excepting the case of
>> structs where I know it is still an issue).
>>
>> On Thu, Dec 12, 2013 at 10:52 AM, Robby Findler
>> <robby at eecs.northwestern.edu> wrote:
>>
>>> Re-reading your message I see that you're not actually asserting
>>> something
>>> different from what I said, but just for some precision here I wish to
>>> point
>>> out that I wasn't basing my opinion on intuition from the code, but on
>>> some
>>> microbenchmark timings. (There was a much more substantial difference
>>> yesterday because the loop inside any-wrap/c wasn't as cheap as it could
>>> have been.)
>>>
>>> I'd be interested to see if your improvements to type->contract improve
>>> the
>>> situation any! I expect they will make things better again for the Number
>>> case, but at the moment, there isn't a big difference.
>>>
>>> Program 1:
>>>
>>>
>>> #lang racket/base
>>> (module m typed/racket/base
>>>    (: f (Any -> Any))
>>>    (define (f x) 1)
>>>    (provide f))
>>> (require 'm)
>>> (time
>>>   (for ([x (in-range 20000)])
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>>
>>> Timings:
>>>
>>> cpu time: 142 real time: 142 gc time: 8
>>> cpu time: 144 real time: 144 gc time: 7
>>> cpu time: 144 real time: 143 gc time: 6
>>> cpu time: 142 real time: 142 gc time: 6
>>> cpu time: 142 real time: 142 gc time: 7
>>> cpu time: 146 real time: 146 gc time: 6
>>>
>>> Program 2:
>>>
>>>
>>> #lang racket/base
>>> (module m typed/racket/base
>>>    (: f (Any -> Integer))
>>>
>>>    (define (f x) 1)
>>>    (provide f))
>>> (require 'm)
>>> (time
>>>   (for ([x (in-range 20000)])
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>>     (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>>
>>>
>>> Timings:
>>>
>>> cpu time: 139 real time: 138 gc time: 7
>>> cpu time: 145 real time: 144 gc time: 7
>>> cpu time: 140 real time: 140 gc time: 6
>>> cpu time: 151 real time: 150 gc time: 6
>>> cpu time: 139 real time: 138 gc time: 6
>>> cpu time: 139 real time: 139 gc time: 8
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 12, 2013 at 12:33 PM, Eric Dobson <eric.n.dobson at gmail.com>
>>> wrote:
>>>
>>>>
>>>> any-wrap/c still requires the check for one value, while any (which is
>>>> from Number not Any) does not. So I would still guess at Number being
>>>> faster, but Robby's changes may make it so that inlining and dead code
>>>> elimination can see through everything and turn it into the same code.
>>>>
>>>> On Thu, Dec 12, 2013 at 10:27 AM, Robby Findler
>>>> <robby at eecs.northwestern.edu> wrote:
>>>>
>>>>> FWIW, my push speeds up the any-wrap/c implementation a bunch. Those
>>>>> two
>>>>> should have similar speeds after you get that, I guess.
>>>>>
>>>>> Robby
>>>>>
>>>>>
>>>>> On Thu, Dec 12, 2013 at 11:03 AM, Neil Toronto <neil.toronto at gmail.com
>>>>> >
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I tried your branch that implements it and saw about 3.5x speedup for
>>>>>> the
>>>>>> `magnitude*' test. This is of course without Robby's recent
>>>>>> first-order
>>>>>> contract changes.
>>>>>>
>>>>>> (I think it's about 3.5x: I tried with magnitude* : Number -> Any
>>>>>> first
>>>>>> and got 2400ms on the easy tests. I changed it to magnitude* : Number
>>>>>> ->
>>>>>> Number and got 690ms. Apparently, for an `Any' return type, an
>>>>>> `any-wrap/c'
>>>>>> contract is generated instead of nothing. If that's much faster to
>>>>>> check
>>>>>> than `number?', though, the speedup is even better.)
>>>>>>
>>>>>> I'd love to see this with Robby's recent changes. Hint? Nudge? Please?
>>>>>>
>>>>>> I didn't see very much speedup with arrays (about 1.2x). Speed tests
>>>>>> on
>>>>>> the math library's distribution objects were very interesting, though,
>>>>>> and
>>>>>> indicate why the arrays might not be much faster. Here's my test
>>>>>> program:
>>>>>>
>>>>>>
>>>>>> #lang racket
>>>>>>
>>>>>> (require math/distributions)
>>>>>>
>>>>>> (define d (normal-dist 0 1))
>>>>>>
>>>>>> (printf "pdf d 0~n")
>>>>>> (for ([_  (in-range 5)])
>>>>>>    (time (for ([_  (in-range 100000)])
>>>>>>            (pdf d 0))))
>>>>>> (newline)
>>>>>>
>>>>>> (define p (distribution-pdf d))
>>>>>> (printf "p 0~n")
>>>>>> (for ([_  (in-range 5)])
>>>>>>    (time (for ([_  (in-range 100000)])
>>>>>>            (p 0))))
>>>>>> (newline)
>>>>>>
>>>>>>
>>>>>> The two tests are equivalent, as `pdf' just pulls the pdf function out
>>>>>> of
>>>>>> the distribution struct and applies it. In TR, the tests are exactly
>>>>>> the
>>>>>> same speed (extremely fast). In untyped Racket, on the main branch,
>>>>>> the
>>>>>> second test is 16x faster, and on your branch, it's 44x faster. (It's
>>>>>> still
>>>>>> 10x slower than TR on your branch, so again... I'd love to see your
>>>>>> changes
>>>>>> and Robby's together. :D)
>>>>>>
>>>>>> Neil ⊥
>>>>>>
>>>>>>
>>>>>> On 12/12/2013 12:40 AM, Eric Dobson wrote:
>>>>>>
>>>>>>>
>>>>>>> Removing the return value checking is in the works. It actually is
>>>>>>> removing all of the checks that would blame typed code, so higher
>>>>>>> order functions/datastructure get improvements too. It is actually
>>>>>>> functional the last time I checked, but lacking documentation which
>>>>>>> is
>>>>>>> what is holding up merging with mainline.
>>>>>>>
>>>>>>> https://github.com/plt/racket/pull/453
>>>>>>>
>>>>>>> On Wed, Dec 11, 2013 at 7:57 PM, Robby Findler
>>>>>>> <robby at eecs.northwestern.edu> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> I see that TR's type->contract returns
>>>>>>>>
>>>>>>>>    (-> (flat-named-contract (quote Float) flonum?)
>>>>>>>> (flat-named-contract
>>>>>>>> (quote
>>>>>>>> Float) flonum?))
>>>>>>>>
>>>>>>>> for the type (Float -> Float), but it could return
>>>>>>>>
>>>>>>>>    (-> (flat-named-contract (quote Float) flonum?) any)
>>>>>>>>
>>>>>>>> which wouldn't do any result value checking (this being different
>>>>>>>> from
>>>>>>>> any/c
>>>>>>>> as the range of the arrow contract).
>>>>>>>>
>>>>>>>> Robby
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Dec 11, 2013 at 6:18 PM, Neil Toronto
>>>>>>>> <neil.toronto at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/11/2013 02:49 PM, Neil Toronto wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12/11/2013 01:55 PM, Stephen Bloch wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Dec 11, 2013, at 2:36 PM, Neil Toronto wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>  numeric primitives implemented in Typed Racket are faster than
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same primitives implemented in C.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Whoa!  How did that happen?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Whoa! That's not what I meant! O_o
>>>>>>>>>>
>>>>>>>>>> I said "we might be getting close" to that. I haven't tried
>>>>>>>>>> porting
>>>>>>>>>> a
>>>>>>>>>> numeric C primitive to TR yet, but I have a hunch that it'll still
>>>>>>>>>> be
>>>>>>>>>> slower. I'll try one now and report what I find.
>>>>>>>>>>
>>>>>>>>>> Neil ⊥
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can't figure out why `flsinh' is faster to call from untyped
>>>>>>>>> Racket
>>>>>>>>> than
>>>>>>>>> `sinh'. All my tests with a Typed Racket `magnitude' show calls
>>>>>>>>> from
>>>>>>>>> untyped
>>>>>>>>> code are significantly slower, except in the one case that it
>>>>>>>>> computes
>>>>>>>>> Euclidean distance. That case is only twice as slow.
>>>>>>>>>
>>>>>>>>> I've attached the benchmark program. The `magnitude*' function is
>>>>>>>>> more
>>>>>>>>> or
>>>>>>>>> less a direct translation of `magnitude' from "number.c" into Typed
>>>>>>>>> Racket.
>>>>>>>>> Here's a summary of the results I get on my computer, in
>>>>>>>>> milliseconds,
>>>>>>>>> for 5
>>>>>>>>> million calls from untyped Racket, by data type.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Function         Flonum  Rational  Fixnum  Integer  Float-Complex
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> -------
>>>>>>>>> magnitude*         385      419      378     414         686
>>>>>>>>> magnitude           59       44       40      40         390
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The only one that's close in relative terms is Float-Complex. The
>>>>>>>>> others
>>>>>>>>> just call `abs'. The decompiled code doesn't show any inlining of
>>>>>>>>> `magnitude', so this comparison should be good.
>>>>>>>>>
>>>>>>>>> I'll bet checking the return value contract (which is unnecessary)
>>>>>>>>> is
>>>>>>>>> the
>>>>>>>>> main slowdown. It has to check for number of values.
>>>>>>>>>
>>>>>>>>> For comparison, here are the timings for running the benchmarks in
>>>>>>>>> TR
>>>>>>>>> with
>>>>>>>>> #:no-optimize:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Function         Flonum  Rational  Fixnum  Integer  Float-Complex
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> -------
>>>>>>>>> magnitude*          45       70*      37     102*       318
>>>>>>>>> magnitude           61       45       39      91*       394
>>>>>>>>>
>>>>>>>>>                                                 * = unexpectedly
>>>>>>>>> high
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's what I understand from comparing the numbers:
>>>>>>>>>
>>>>>>>>>    * Except for non-fixnum integers, calling `magnitude' in TR is
>>>>>>>>> just
>>>>>>>>> as
>>>>>>>>> fast as in untyped Racket. I have no idea why it would be slower on
>>>>>>>>> big
>>>>>>>>> integers. That's just weird.
>>>>>>>>>
>>>>>>>>>    * Calling `abs' in Racket is faster than calling `scheme_abs' in
>>>>>>>>> C,
>>>>>>>>> except on rationals and big integers.
>>>>>>>>>
>>>>>>>>>    * Operating on flonums in Typed Racket, using generic numeric
>>>>>>>>> functions,
>>>>>>>>> is faster than doing the same in C.
>>>>>>>>>
>>>>>>>>> Overall, it looks like the TR code is within the same order of
>>>>>>>>> magnitude
>>>>>>>>> (pun not intended) as the C code. I would love to try this
>>>>>>>>> benchmark
>>>>>>>>> with
>>>>>>>>> either 1) a `magnitude*' with an `AnyValues' return type; or 2) a
>>>>>>>>> contract
>>>>>>>>> boundary that doesn't check TR's return types for first-order
>>>>>>>>> functions.
>>>>>>>>>
>>>>>>>>> (I managed to make a `magnitude*' with type Number -> AnyValues,
>>>>>>>>> but
>>>>>>>>> TR
>>>>>>>>> couldn't make a contract for it.)
>>>>>>>>>
>>>>>>>>> Neil ⊥
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _________________________
>>>>>>>>>     Racket Developers list:
>>>>>>>>>     http://lists.racket-lang.org/dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _________________________
>>>>>>>>     Racket Developers list:
>>>>>>>>     http://lists.racket-lang.org/dev
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20140114/b9e843c9/attachment-0001.html>

Posted on the dev mailing list.