[racket-dev] [plt] Push #27909: master branch updated

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Tue Jan 14 14:15:29 EST 2014

I expect that one'll still be horrible, then. :)


On Tue, Jan 14, 2014 at 12:37 PM, Neil Toronto <neil.toronto at gmail.com>wrote:

>
> I don't think so. The flonum functions are Flonum -> Flonum, and
> `magnitude*' is Number -> Number.
>
> The best candidate from the math library is probably `mean'.
>
> Neil ⊥
>
>
> On 01/14/2014 11:24 AM, Robby Findler wrote:
>
>> Do any of these functions turn out to have case-> contracts by the time
>> TR gets done with them?
>>
>> Robby
>>
>>
>>
>> On Tue, Jan 14, 2014 at 12:22 PM, Neil Toronto <neil.toronto at gmail.com
>> <mailto:neil.toronto at gmail.com>> wrote:
>>
>>     An update on my math-centered tests. The first is the built-in
>>     `magnitude' vs. a TR near-transliteration of its C implementation,
>>     called from *untyped* Racket. The numbers are in milliseconds, for 5
>>     million calls, by input type:
>>
>>
>>     Function         Flonum  Rational  Fixnum  Integer  Float-Complex
>>     ------------------------------__----------------------------
>> --__-------
>>
>>     Pre-static contracts:
>>
>>     magnitude*         385      419      378     414         686
>>     magnitude           59       44       40      40         390
>>     Post-static contracts:
>>     magnitude*         175      196      164     195         475
>>     magnitude           68       43       36      40         406
>>
>>     All but the Float-Complex case just return the absolute value of the
>>     argument, so there's not much computation. They're dominated by
>>     contract checking, and the speedup is 0.45 to 0.5. Nice. The
>>     Float-Complex case does something substantive, and is close to C.
>>     That's awesome.
>>
>>     So for really small functions (i.e. just a test and a call to
>>     `abs'), writing them in C is still better. But for anything much
>>     larger (that part of `magnitude*' is about 10 lines and 7-13 numeric
>>     ops), a TR implementation isn't much slower than C (about 17%).
>>
>>     --
>>
>>     1 million iterations of some math library flonum functions, in TR,
>>     untyped before Robby's changes, untyped after Robby's 12/12 changes,
>>     and untyped after Eric's static contract changes, in milliseconds:
>>
>>     Function         TR     Untyped    Robby's   Robby's+Eric's
>>     ------------------------------__------------------------------__--
>>
>>     flrational?       5       322        98           37
>>     flsinh           55       343       121           86
>>     fllog1p          47       351       117           80
>>     lg+              61       384       154          115
>>     flgamma         165       521       262          234
>>
>>     So calling TR floating-point functions from untyped Racket takes
>>     only 0.11 to 0.45 times the amount of time it took only a month ago,
>>     with typical cases being about 0.25. That's awesome.
>>
>>     Neil ⊥
>>
>>
>>     On 01/14/2014 10:27 AM, Eric Dobson wrote:
>>
>>         The changes to TR contract generation are now in at HEAD.
>>
>>         If you can find any cases where contracts are still slowing
>> programs
>>         down by a lot I'd like to take a look at them. (Excepting the
>>         case of
>>         structs where I know it is still an issue).
>>
>>         On Thu, Dec 12, 2013 at 10:52 AM, Robby Findler
>>         <robby at eecs.northwestern.edu
>>         <mailto:robby at eecs.northwestern.edu>> wrote:
>>
>>             Re-reading your message I see that you're not actually
>>             asserting something
>>             different from what I said, but just for some precision here
>>             I wish to point
>>             out that I wasn't basing my opinion on intuition from the
>>             code, but on some
>>             microbenchmark timings. (There was a much more substantial
>>             difference
>>             yesterday because the loop inside any-wrap/c wasn't as cheap
>>             as it could
>>             have been.)
>>
>>             I'd be interested to see if your improvements to
>>             type->contract improve the
>>             situation any! I expect they will make things better again
>>             for the Number
>>             case, but at the moment, there isn't a big difference.
>>
>>             Program 1:
>>
>>
>>             #lang racket/base
>>             (module m typed/racket/base
>>                 (: f (Any -> Any))
>>                 (define (f x) 1)
>>                 (provide f))
>>             (require 'm)
>>             (time
>>                (for ([x (in-range 20000)])
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>
>>             Timings:
>>
>>             cpu time: 142 real time: 142 gc time: 8
>>             cpu time: 144 real time: 144 gc time: 7
>>             cpu time: 144 real time: 143 gc time: 6
>>             cpu time: 142 real time: 142 gc time: 6
>>             cpu time: 142 real time: 142 gc time: 7
>>             cpu time: 146 real time: 146 gc time: 6
>>
>>             Program 2:
>>
>>
>>             #lang racket/base
>>             (module m typed/racket/base
>>                 (: f (Any -> Integer))
>>
>>                 (define (f x) 1)
>>                 (provide f))
>>             (require 'm)
>>             (time
>>                (for ([x (in-range 20000)])
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)
>>                  (f 0) (f 1) (f 2) (f 3) (f 4) (f 5) (f 6) (f 7)))
>>
>>
>>             Timings:
>>
>>             cpu time: 139 real time: 138 gc time: 7
>>             cpu time: 145 real time: 144 gc time: 7
>>             cpu time: 140 real time: 140 gc time: 6
>>             cpu time: 151 real time: 150 gc time: 6
>>             cpu time: 139 real time: 138 gc time: 6
>>             cpu time: 139 real time: 139 gc time: 8
>>
>>
>>
>>
>>
>>
>>
>>
>>             On Thu, Dec 12, 2013 at 12:33 PM, Eric Dobson
>>             <eric.n.dobson at gmail.com <mailto:eric.n.dobson at gmail.com>>
>>
>>             wrote:
>>
>>
>>                 any-wrap/c still requires the check for one value, while
>>                 any (which is
>>                 from Number not Any) does not. So I would still guess at
>>                 Number being
>>                 faster, but Robby's changes may make it so that inlining
>>                 and dead code
>>                 elimination can see through everything and turn it into
>>                 the same code.
>>
>>                 On Thu, Dec 12, 2013 at 10:27 AM, Robby Findler
>>                 <robby at eecs.northwestern.edu
>>                 <mailto:robby at eecs.northwestern.edu>> wrote:
>>
>>                     FWIW, my push speeds up the any-wrap/c
>>                     implementation a bunch. Those two
>>                     should have similar speeds after you get that, I
>> guess.
>>
>>                     Robby
>>
>>
>>                     On Thu, Dec 12, 2013 at 11:03 AM, Neil Toronto
>>                     <neil.toronto at gmail.com <mailto:
>> neil.toronto at gmail.com>>
>>
>>                     wrote:
>>
>>
>>                         I tried your branch that implements it and saw
>>                         about 3.5x speedup for
>>                         the
>>                         `magnitude*' test. This is of course without
>>                         Robby's recent first-order
>>                         contract changes.
>>
>>                         (I think it's about 3.5x: I tried with
>>                         magnitude* : Number -> Any first
>>                         and got 2400ms on the easy tests. I changed it
>>                         to magnitude* : Number
>>                         ->
>>                         Number and got 690ms. Apparently, for an `Any'
>>                         return type, an
>>                         `any-wrap/c'
>>                         contract is generated instead of nothing. If
>>                         that's much faster to
>>                         check
>>                         than `number?', though, the speedup is even
>> better.)
>>
>>                         I'd love to see this with Robby's recent
>>                         changes. Hint? Nudge? Please?
>>
>>                         I didn't see very much speedup with arrays
>>                         (about 1.2x). Speed tests on
>>                         the math library's distribution objects were
>>                         very interesting, though,
>>                         and
>>                         indicate why the arrays might not be much
>>                         faster. Here's my test
>>                         program:
>>
>>
>>                         #lang racket
>>
>>                         (require math/distributions)
>>
>>                         (define d (normal-dist 0 1))
>>
>>                         (printf "pdf d 0~n")
>>                         (for ([_  (in-range 5)])
>>                             (time (for ([_  (in-range 100000)])
>>                                     (pdf d 0))))
>>                         (newline)
>>
>>                         (define p (distribution-pdf d))
>>                         (printf "p 0~n")
>>                         (for ([_  (in-range 5)])
>>                             (time (for ([_  (in-range 100000)])
>>                                     (p 0))))
>>                         (newline)
>>
>>
>>                         The two tests are equivalent, as `pdf' just
>>                         pulls the pdf function out
>>                         of
>>                         the distribution struct and applies it. In TR,
>>                         the tests are exactly
>>                         the
>>                         same speed (extremely fast). In untyped Racket,
>>                         on the main branch, the
>>                         second test is 16x faster, and on your branch,
>>                         it's 44x faster. (It's
>>                         still
>>                         10x slower than TR on your branch, so again...
>>                         I'd love to see your
>>                         changes
>>                         and Robby's together. :D)
>>
>>                         Neil ⊥
>>
>>
>>                         On 12/12/2013 12:40 AM, Eric Dobson wrote:
>>
>>
>>                             Removing the return value checking is in the
>>                             works. It actually is
>>                             removing all of the checks that would blame
>>                             typed code, so higher
>>                             order functions/datastructure get
>>                             improvements too. It is actually
>>                             functional the last time I checked, but
>>                             lacking documentation which is
>>                             what is holding up merging with mainline.
>>
>>                             https://github.com/plt/racket/__pull/453
>>
>>                             <https://github.com/plt/racket/pull/453>
>>
>>                             On Wed, Dec 11, 2013 at 7:57 PM, Robby Findler
>>                             <robby at eecs.northwestern.edu
>>                             <mailto:robby at eecs.northwestern.edu>> wrote:
>>
>>
>>                                 I see that TR's type->contract returns
>>
>>                                     (-> (flat-named-contract (quote
>>                                 Float) flonum?)
>>                                 (flat-named-contract
>>                                 (quote
>>                                 Float) flonum?))
>>
>>                                 for the type (Float -> Float), but it
>>                                 could return
>>
>>                                     (-> (flat-named-contract (quote
>>                                 Float) flonum?) any)
>>
>>                                 which wouldn't do any result value
>>                                 checking (this being different
>>                                 from
>>                                 any/c
>>                                 as the range of the arrow contract).
>>
>>                                 Robby
>>
>>
>>                                 On Wed, Dec 11, 2013 at 6:18 PM, Neil
>>                                 Toronto
>>                                 <neil.toronto at gmail.com
>>                                 <mailto:neil.toronto at gmail.com>>
>>
>>                                 wrote:
>>
>>
>>
>>                                     On 12/11/2013 02:49 PM, Neil Toronto
>>                                     wrote:
>>
>>
>>
>>                                         On 12/11/2013 01:55 PM, Stephen
>>                                         Bloch wrote:
>>
>>
>>
>>                                                 On Dec 11, 2013, at 2:36
>>                                                 PM, Neil Toronto wrote:
>>
>>                                                     numeric primitives
>>                                                     implemented in Typed
>>                                                     Racket are faster than
>>                                                     the
>>                                                     same primitives
>>                                                     implemented in C.
>>
>>
>>
>>
>>                                             Whoa!  How did that happen?
>>
>>
>>
>>
>>                                         Whoa! That's not what I meant! O_o
>>
>>                                         I said "we might be getting
>>                                         close" to that. I haven't tried
>>                                         porting
>>                                         a
>>                                         numeric C primitive to TR yet,
>>                                         but I have a hunch that it'll
>> still
>>                                         be
>>                                         slower. I'll try one now and
>>                                         report what I find.
>>
>>                                         Neil ⊥
>>
>>
>>
>>
>>                                     I can't figure out why `flsinh' is
>>                                     faster to call from untyped
>>                                     Racket
>>                                     than
>>                                     `sinh'. All my tests with a Typed
>>                                     Racket `magnitude' show calls from
>>                                     untyped
>>                                     code are significantly slower,
>>                                     except in the one case that it
>>                                     computes
>>                                     Euclidean distance. That case is
>>                                     only twice as slow.
>>
>>                                     I've attached the benchmark program.
>>                                     The `magnitude*' function is
>>                                     more
>>                                     or
>>                                     less a direct translation of
>>                                     `magnitude' from "number.c" into Typed
>>                                     Racket.
>>                                     Here's a summary of the results I
>>                                     get on my computer, in
>>                                     milliseconds,
>>                                     for 5
>>                                     million calls from untyped Racket,
>>                                     by data type.
>>
>>
>>                                     Function         Flonum  Rational
>>                                       Fixnum  Integer  Float-Complex
>>                                     ------------------------------
>> __------------------------------__-------
>>
>>                                     magnitude*         385      419
>>                                       378     414         686
>>                                     magnitude           59       44
>>                                        40      40         390
>>
>>
>>                                     The only one that's close in
>>                                     relative terms is Float-Complex. The
>>                                     others
>>                                     just call `abs'. The decompiled code
>>                                     doesn't show any inlining of
>>                                     `magnitude', so this comparison
>>                                     should be good.
>>
>>                                     I'll bet checking the return value
>>                                     contract (which is unnecessary)
>>                                     is
>>                                     the
>>                                     main slowdown. It has to check for
>>                                     number of values.
>>
>>                                     For comparison, here are the timings
>>                                     for running the benchmarks in
>>                                     TR
>>                                     with
>>                                     #:no-optimize:
>>
>>
>>                                     Function         Flonum  Rational
>>                                       Fixnum  Integer  Float-Complex
>>                                     ------------------------------
>> __------------------------------__-------
>>
>>                                     magnitude*          45       70*
>>                                       37     102*       318
>>                                     magnitude           61       45
>>                                        39      91*       394
>>
>>
>>                                                  * = unexpectedly high
>>
>>
>>                                     Here's what I understand from
>>                                     comparing the numbers:
>>
>>                                         * Except for non-fixnum
>>                                     integers, calling `magnitude' in TR is
>>                                     just
>>                                     as
>>                                     fast as in untyped Racket. I have no
>>                                     idea why it would be slower on
>>                                     big
>>                                     integers. That's just weird.
>>
>>                                         * Calling `abs' in Racket is
>>                                     faster than calling `scheme_abs' in
>>                                     C,
>>                                     except on rationals and big integers.
>>
>>                                         * Operating on flonums in Typed
>>                                     Racket, using generic numeric
>>                                     functions,
>>                                     is faster than doing the same in C.
>>
>>                                     Overall, it looks like the TR code
>>                                     is within the same order of
>>                                     magnitude
>>                                     (pun not intended) as the C code. I
>>                                     would love to try this benchmark
>>                                     with
>>                                     either 1) a `magnitude*' with an
>>                                     `AnyValues' return type; or 2) a
>>                                     contract
>>                                     boundary that doesn't check TR's
>>                                     return types for first-order
>>                                     functions.
>>
>>                                     (I managed to make a `magnitude*'
>>                                     with type Number -> AnyValues, but
>>                                     TR
>>                                     couldn't make a contract for it.)
>>
>>                                     Neil ⊥
>>
>>
>>                                     _________________________
>>                                          Racket Developers list:
>>                                     http://lists.racket-lang.org/__dev
>>
>>                                     <http://lists.racket-lang.org/dev>
>>
>>
>>                                 _________________________
>>                                      Racket Developers list:
>>                                 http://lists.racket-lang.org/__dev
>>                                 <http://lists.racket-lang.org/dev>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20140114/0bd37caf/attachment-0001.html>

Posted on the dev mailing list.