[plt-dev] `unsafe-fl' and unboxing
As of v4.2.2.3 (in SVN), the JIT can unbox some intermediate flonum
values in `unsafe-fl' combinations. For example, in
 (unsafe-fl+ 1.0 (unsafe-fl- x y))
the JIT can see that `unsafe-fl-' will produce a flonum while
`unsafe-fl+' consumes a flonum, so there is no reason to box the flonum
and then immediately unbox it. Instead, the subtraction result is left
in a register for the addition operation.
As an example application, Doug's statistics library includes
 (define (mean-and-variance data)
   (let-values (((n m s)
                 (dispatch-for/fold ((i-old 0)
                                     (m-old 0.0)
                                     (s-old 0.0))
                                    ((x data))
                   (let ((i-new (add1 i-old)))
                     (if (= i-new 1)
                         (values i-new (exact->inexact x) 0.0)
                         (let* ((m-new (+ m-old (/ (- x m-old) i-new)))
                                (s-new (+ s-old (* (- x m-old) (- x m-new)))))
                           (values i-new m-new s-new)))))))
     (values m (if (> n 1) (/ s (sub1 n)) 0.0))))
If you're willing to re-write that as
 (define (mean-and-variance data)
   (let-values (((n m s)
                 (dispatch-for/fold ((i-old 0)
                                     (m-old 0.0)
                                     (s-old 0.0))
                                    ((x data))
                   (let ((i-new (unsafe-fx+ i-old 1))
                         (x (exact->inexact x))) ; ensure flonum, assuming real
                     (if (unsafe-fx= i-new 1)
                         (values i-new x 0.0)
                         (let* ((m-new (unsafe-fl+ m-old 
                                                   (unsafe-fl/ 
                                                    (unsafe-fl- x m-old) 
                                                    (unsafe-fx->fl i-new))))
                                (s-new (unsafe-fl+ s-old
                                                   (unsafe-fl*
                                                    (unsafe-fl- x m-old) 
                                                    (unsafe-fl- x m-new)))))
                           (values i-new m-new s-new)))))))
     (values m (if (> n 1) (/ s (sub1 n)) 0.0))))
then the JIT can avoid about 6 boxes every iteration (leaving abut 2
boxes, assuming that `data' is a vector of flonums).
To measure the effect of unsafe operations and unboxing, I used code
that Doug posted recently:
  (let ((data1 (build-vector
                100000
                (lambda (i)
                  (random-unit-gaussian)))))
    ;; Don't time first call
    (variance data1)
    ;; Checked:
    (time
     (for ((i (in-range 10)))
       (variance data1)))
    (collect-garbage)
    ;; Unchecked:
    (time
     (for ((i (in-range 10)))
       (unchecked-variance data1))))
The times in msec on Mac OS X 10.6.1 using a 2.53 GHz MacBook Pro:
                             x86          x86_64
                         chkd/unchkd    chkd/unchkd
   original               165 / 155     143 / 129
   unsafe (~v4.2.2)       138 / 128     138 / 123
   unsafe+unboxed (new)    94 / 70       67 / 55
Each table cell shows the "checked" version of the function, which
includes a contract (to ensure that `data' is a sequence of reals), and
the contract-less "unchecked" version. The "unchecked" comparison is
not apples-to-apples for users of the library, because using unsafe
operations makes the "unchecked" version unsafe at the Scheme level
(while the contract on the "checked" version ensures that the unsafe
operations will not cause a crash).
The JIT only unboxes in expressions that combine
 * `unsafe-fl' operations,
 * `unsafe-fx->fl', or
 * variable access (local or global)
For example, the bytecode compiler currently leaves the `z' intact in
 (let ([z (unsafe-fl- x y)])
   (unsafe-fl+ 1.0 z))
which means that the JIT cannot avoid the box for `z'. I'm planning
improvements for the bytecode compiler to help the JIT in such cases.
Other directions for future improvement include allowing more
`unsafe-fx' operations in unboxable combinations and adding support for
reading and writing to an array of flonums without extra boxing.