[plt-dev] `unsafe-fl' and unboxing

From: Doug Williams (m.douglas.williams at gmail.com)
Date: Sat Oct 3 12:29:01 EDT 2009

About a 2x speed improvement is worth the rewrite. And, in the case of the
statistics functions, for example, most always return a float result (even
with non-float inputs) and would benefit from the rewrite. [Most that don't
always return a float, like minimum and maximum, aren't don't much
computation anyway.] I'd really like to try in on things like the Chebyshev
evaluator (which is basically used for all of the special functions) and the
differential equation solver (which is used by the simulation engine for
continuous simulations).

Are the basic unsafe-fl, unsafe-fx, and unsafe-vector-ref operations
included in 4.2.2? [I realize that the JIT enhancements here would not be.]
If they are, then I can put them into a new version of the science
collection that is dependent on 4.2.2 instead of some upcoming release.

Doug

On Sat, Oct 3, 2009 at 10:34 AM, Matthew Flatt <mflatt at cs.utah.edu> wrote:

> As of v4.2.2.3 (in SVN), the JIT can unbox some intermediate flonum
> values in `unsafe-fl' combinations. For example, in
>
>  (unsafe-fl+ 1.0 (unsafe-fl- x y))
>
> the JIT can see that `unsafe-fl-' will produce a flonum while
> `unsafe-fl+' consumes a flonum, so there is no reason to box the flonum
> and then immediately unbox it. Instead, the subtraction result is left
> in a register for the addition operation.
>
> As an example application, Doug's statistics library includes
>
>  (define (mean-and-variance data)
>   (let-values (((n m s)
>                 (dispatch-for/fold ((i-old 0)
>                                     (m-old 0.0)
>                                     (s-old 0.0))
>                                    ((x data))
>                   (let ((i-new (add1 i-old)))
>                     (if (= i-new 1)
>                         (values i-new (exact->inexact x) 0.0)
>                         (let* ((m-new (+ m-old (/ (- x m-old) i-new)))
>                                (s-new (+ s-old (* (- x m-old) (- x
> m-new)))))
>                           (values i-new m-new s-new)))))))
>     (values m (if (> n 1) (/ s (sub1 n)) 0.0))))
>
> If you're willing to re-write that as
>
>  (define (mean-and-variance data)
>   (let-values (((n m s)
>                 (dispatch-for/fold ((i-old 0)
>                                     (m-old 0.0)
>                                     (s-old 0.0))
>                                    ((x data))
>                   (let ((i-new (unsafe-fx+ i-old 1))
>                         (x (exact->inexact x))) ; ensure flonum, assuming
> real
>                     (if (unsafe-fx= i-new 1)
>                         (values i-new x 0.0)
>                         (let* ((m-new (unsafe-fl+ m-old
>                                                   (unsafe-fl/
>                                                    (unsafe-fl- x m-old)
>                                                    (unsafe-fx->fl i-new))))
>                                (s-new (unsafe-fl+ s-old
>                                                   (unsafe-fl*
>                                                    (unsafe-fl- x m-old)
>                                                    (unsafe-fl- x m-new)))))
>                           (values i-new m-new s-new)))))))
>     (values m (if (> n 1) (/ s (sub1 n)) 0.0))))
>
> then the JIT can avoid about 6 boxes every iteration (leaving abut 2
> boxes, assuming that `data' is a vector of flonums).
>
>
> To measure the effect of unsafe operations and unboxing, I used code
> that Doug posted recently:
>
>  (let ((data1 (build-vector
>                100000
>                (lambda (i)
>                  (random-unit-gaussian)))))
>    ;; Don't time first call
>    (variance data1)
>    ;; Checked:
>    (time
>     (for ((i (in-range 10)))
>       (variance data1)))
>    (collect-garbage)
>    ;; Unchecked:
>    (time
>     (for ((i (in-range 10)))
>       (unchecked-variance data1))))
>
>
> The times in msec on Mac OS X 10.6.1 using a 2.53 GHz MacBook Pro:
>
>                             x86          x86_64
>                         chkd/unchkd    chkd/unchkd
>
>   original               165 / 155     143 / 129
>   unsafe (~v4.2.2)       138 / 128     138 / 123
>   unsafe+unboxed (new)    94 / 70       67 / 55
>
> Each table cell shows the "checked" version of the function, which
> includes a contract (to ensure that `data' is a sequence of reals), and
> the contract-less "unchecked" version. The "unchecked" comparison is
> not apples-to-apples for users of the library, because using unsafe
> operations makes the "unchecked" version unsafe at the Scheme level
> (while the contract on the "checked" version ensures that the unsafe
> operations will not cause a crash).
>
>
> The JIT only unboxes in expressions that combine
>
>  * `unsafe-fl' operations,
>  * `unsafe-fx->fl', or
>  * variable access (local or global)
>
> For example, the bytecode compiler currently leaves the `z' intact in
>
>  (let ([z (unsafe-fl- x y)])
>   (unsafe-fl+ 1.0 z))
>
> which means that the JIT cannot avoid the box for `z'. I'm planning
> improvements for the bytecode compiler to help the JIT in such cases.
>
> Other directions for future improvement include allowing more
> `unsafe-fx' operations in unboxable combinations and adding support for
> reading and writing to an array of flonums without extra boxing.
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20091003/ff48384d/attachment.html>

Posted on the dev mailing list.