[racket-dev] [plt] Push #27909: master branch updated

From: Neil Toronto (neil.toronto at gmail.com)
Date: Wed Dec 11 14:36:39 EST 2013

On 12/11/2013 11:07 AM, robby at racket-lang.org wrote:
> robby has updated `master' from 542e256206 to c321f6dd0c.
>    http://git.racket-lang.org/plt/542e256206..c321f6dd0c
>
> =====[ One Commit ]=====================================================
> Directory summary:
>    37.6% pkgs/racket-pkgs/racket-test/tests/racket/contract/
>     5.5% pkgs/
>    46.3% racket/collects/racket/contract/private/
>    10.0% racket/collects/racket/private/
>
> ~~~~~~~~~~
>
> c321f6d Robby Findler <robby at racket-lang.org> 2013-12-04 22:35
> :
> | Change contract system so that projections are more first-order friendly

Awesome. I've attached some more benchmarks, for `flrational?', 
`flsinh', `fllog1p', `lg+', and `flgamma'. These functions are pretty 
representative, and have a range of complexity from trivial to 
complicated. (For example, `flrational?' is implemented using two flops, 
and `flgamma' usually does ~50 flops in the range I tested.)

Approximate average times in milliseconds for 1 million calls:

Function         TR     Untyped pre-push     Untyped post-push
--------------------------------------------------------------
flrational?       5             322                   98
flsinh           55             343                  121
fllog1p          47             351                  117
lg+              61             384                  154
flgamma         165             521                  262

There's also less variance in the timings, probably because there are 
fewer minor GC pauses during the tests.

Not shown on the table: untyped `sinh' calls take 140ms in the same 
test, so it's now faster to use `flsinh' from `math/flonum' in untyped 
code, if operating on flonums. Cool. We might be getting close to where 
numeric primitives implemented in Typed Racket are faster than the same 
primitives implemented in C.

The `flrational?' test is still amazing in TR. The function's two flops 
get inlined (I checked the decompiled module), which I suppose allows 
more JIT-level optimizations.

The only things I can think of to account for the extra time over TR's 
now are range/domain checking and boxing flonum return values. I think I 
remember hearing something from someone (maybe Eric?) at RacketCon about 
inlining contract checks. Is that in the works?

Neil ⊥

-------------- next part --------------
#lang racket

(require math/flonum
         math/special-functions
         racket/unsafe/ops
         (only-in typed/racket/base :))

(define x (random))

(: bx Boolean)
(define bx #f)

(define vec (make-flvector 1))

(define n 1000000)

(printf "flrational?~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
          (set! bx (flrational? x)))))
(newline)

(printf "flsinh~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
          (unsafe-flvector-set! vec 0 (flsinh x)))))
(newline)

(printf "fllog1p~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
          (unsafe-flvector-set! vec 0 (fllog1p x)))))
(newline)

(printf "lg+~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
          (unsafe-flvector-set! vec 0 (lg+ x x)))))
(newline)

(printf "flgamma~n")
(for ([_  (in-range 5)])
  (time (for ([_  (in-range n)])
          (unsafe-flvector-set! vec 0 (flgamma x)))))
(newline)

-------------- next part --------------
flrational?
cpu time: 4 real time: 5 gc time: 0
cpu time: 4 real time: 6 gc time: 0
cpu time: 8 real time: 5 gc time: 0
cpu time: 4 real time: 6 gc time: 0
cpu time: 8 real time: 5 gc time: 0

flsinh
cpu time: 52 real time: 55 gc time: 4
cpu time: 56 real time: 54 gc time: 0
cpu time: 52 real time: 54 gc time: 0
cpu time: 56 real time: 54 gc time: 4
cpu time: 56 real time: 56 gc time: 0

fllog1p
cpu time: 48 real time: 46 gc time: 0
cpu time: 44 real time: 48 gc time: 0
cpu time: 48 real time: 47 gc time: 4
cpu time: 48 real time: 47 gc time: 0
cpu time: 48 real time: 47 gc time: 0

lg+
cpu time: 60 real time: 61 gc time: 0
cpu time: 60 real time: 61 gc time: 0
cpu time: 60 real time: 61 gc time: 4
cpu time: 64 real time: 61 gc time: 0
cpu time: 60 real time: 63 gc time: 0

flgamma
cpu time: 168 real time: 167 gc time: 4
cpu time: 164 real time: 165 gc time: 0
cpu time: 164 real time: 164 gc time: 0
cpu time: 164 real time: 164 gc time: 4
cpu time: 168 real time: 165 gc time: 0

-------------- next part --------------
flrational?
cpu time: 316 real time: 315 gc time: 0
cpu time: 316 real time: 314 gc time: 0
cpu time: 328 real time: 328 gc time: 4
cpu time: 324 real time: 326 gc time: 0
cpu time: 328 real time: 327 gc time: 0

flsinh
cpu time: 348 real time: 350 gc time: 12
cpu time: 336 real time: 336 gc time: 0
cpu time: 340 real time: 338 gc time: 0
cpu time: 348 real time: 349 gc time: 16
cpu time: 344 real time: 343 gc time: 4

fllog1p
cpu time: 348 real time: 347 gc time: 4
cpu time: 352 real time: 354 gc time: 4
cpu time: 348 real time: 349 gc time: 8
cpu time: 348 real time: 347 gc time: 4
cpu time: 360 real time: 359 gc time: 4

lg+
cpu time: 376 real time: 379 gc time: 0
cpu time: 384 real time: 384 gc time: 8
cpu time: 388 real time: 387 gc time: 4
cpu time: 380 real time: 381 gc time: 0
cpu time: 388 real time: 387 gc time: 4

flgamma
cpu time: 516 real time: 517 gc time: 8
cpu time: 532 real time: 530 gc time: 12
cpu time: 516 real time: 518 gc time: 8
cpu time: 528 real time: 529 gc time: 4
cpu time: 516 real time: 515 gc time: 16

-------------- next part --------------
flrational?
cpu time: 100 real time: 99 gc time: 4
cpu time: 96 real time: 98 gc time: 0
cpu time: 100 real time: 100 gc time: 0
cpu time: 96 real time: 96 gc time: 0
cpu time: 100 real time: 98 gc time: 0

flsinh
cpu time: 124 real time: 125 gc time: 0
cpu time: 120 real time: 120 gc time: 0
cpu time: 120 real time: 120 gc time: 0
cpu time: 120 real time: 120 gc time: 0
cpu time: 120 real time: 122 gc time: 0

fllog1p
cpu time: 116 real time: 116 gc time: 0
cpu time: 120 real time: 118 gc time: 0
cpu time: 116 real time: 118 gc time: 0
cpu time: 120 real time: 120 gc time: 0
cpu time: 116 real time: 115 gc time: 8

lg+
cpu time: 152 real time: 153 gc time: 4
cpu time: 152 real time: 152 gc time: 0
cpu time: 156 real time: 154 gc time: 4
cpu time: 156 real time: 155 gc time: 4
cpu time: 152 real time: 154 gc time: 4

flgamma
cpu time: 268 real time: 266 gc time: 8
cpu time: 260 real time: 261 gc time: 0
cpu time: 260 real time: 261 gc time: 0
cpu time: 264 real time: 262 gc time: 0
cpu time: 260 real time: 259 gc time: 0


Posted on the dev mailing list.