About a 2x speed improvement is worth the rewrite. And, in the case of the statistics functions, for example, most always return a float result (even with non-float inputs) and would benefit from the rewrite. [Most that don't always return a float, like minimum and maximum, aren't don't much computation anyway.] I'd really like to try in on things like the Chebyshev evaluator (which is basically used for all of the special functions) and the differential equation solver (which is used by the simulation engine for continuous simulations).<br>
<br>Are the basic unsafe-fl, unsafe-fx, and unsafe-vector-ref operations included in 4.2.2? [I realize that the JIT enhancements here would not be.] If they are, then I can put them into a new version of the science collection that is dependent on 4.2.2 instead of some upcoming release.<br>
<br>Doug<br><br><div class="gmail_quote">On Sat, Oct 3, 2009 at 10:34 AM, Matthew Flatt <span dir="ltr"><<a href="mailto:mflatt@cs.utah.edu">mflatt@cs.utah.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
As of v4.2.2.3 (in SVN), the JIT can unbox some intermediate flonum<br>
values in `unsafe-fl' combinations. For example, in<br>
<br>
(unsafe-fl+ 1.0 (unsafe-fl- x y))<br>
<br>
the JIT can see that `unsafe-fl-' will produce a flonum while<br>
`unsafe-fl+' consumes a flonum, so there is no reason to box the flonum<br>
and then immediately unbox it. Instead, the subtraction result is left<br>
in a register for the addition operation.<br>
<br>
As an example application, Doug's statistics library includes<br>
<br>
(define (mean-and-variance data)<br>
(let-values (((n m s)<br>
(dispatch-for/fold ((i-old 0)<br>
(m-old 0.0)<br>
(s-old 0.0))<br>
((x data))<br>
(let ((i-new (add1 i-old)))<br>
(if (= i-new 1)<br>
(values i-new (exact->inexact x) 0.0)<br>
(let* ((m-new (+ m-old (/ (- x m-old) i-new)))<br>
(s-new (+ s-old (* (- x m-old) (- x m-new)))))<br>
(values i-new m-new s-new)))))))<br>
(values m (if (> n 1) (/ s (sub1 n)) 0.0))))<br>
<br>
If you're willing to re-write that as<br>
<br>
(define (mean-and-variance data)<br>
(let-values (((n m s)<br>
(dispatch-for/fold ((i-old 0)<br>
(m-old 0.0)<br>
(s-old 0.0))<br>
((x data))<br>
(let ((i-new (unsafe-fx+ i-old 1))<br>
(x (exact->inexact x))) ; ensure flonum, assuming real<br>
(if (unsafe-fx= i-new 1)<br>
(values i-new x 0.0)<br>
(let* ((m-new (unsafe-fl+ m-old<br>
(unsafe-fl/<br>
(unsafe-fl- x m-old)<br>
(unsafe-fx->fl i-new))))<br>
(s-new (unsafe-fl+ s-old<br>
(unsafe-fl*<br>
(unsafe-fl- x m-old)<br>
(unsafe-fl- x m-new)))))<br>
(values i-new m-new s-new)))))))<br>
(values m (if (> n 1) (/ s (sub1 n)) 0.0))))<br>
<br>
then the JIT can avoid about 6 boxes every iteration (leaving abut 2<br>
boxes, assuming that `data' is a vector of flonums).<br>
<br>
<br>
To measure the effect of unsafe operations and unboxing, I used code<br>
that Doug posted recently:<br>
<br>
(let ((data1 (build-vector<br>
100000<br>
(lambda (i)<br>
(random-unit-gaussian)))))<br>
;; Don't time first call<br>
(variance data1)<br>
;; Checked:<br>
(time<br>
(for ((i (in-range 10)))<br>
(variance data1)))<br>
(collect-garbage)<br>
;; Unchecked:<br>
(time<br>
(for ((i (in-range 10)))<br>
(unchecked-variance data1))))<br>
<br>
<br>
The times in msec on Mac OS X 10.6.1 using a 2.53 GHz MacBook Pro:<br>
<br>
x86 x86_64<br>
chkd/unchkd chkd/unchkd<br>
<br>
original 165 / 155 143 / 129<br>
unsafe (~v4.2.2) 138 / 128 138 / 123<br>
unsafe+unboxed (new) 94 / 70 67 / 55<br>
<br>
Each table cell shows the "checked" version of the function, which<br>
includes a contract (to ensure that `data' is a sequence of reals), and<br>
the contract-less "unchecked" version. The "unchecked" comparison is<br>
not apples-to-apples for users of the library, because using unsafe<br>
operations makes the "unchecked" version unsafe at the Scheme level<br>
(while the contract on the "checked" version ensures that the unsafe<br>
operations will not cause a crash).<br>
<br>
<br>
The JIT only unboxes in expressions that combine<br>
<br>
* `unsafe-fl' operations,<br>
* `unsafe-fx->fl', or<br>
* variable access (local or global)<br>
<br>
For example, the bytecode compiler currently leaves the `z' intact in<br>
<br>
(let ([z (unsafe-fl- x y)])<br>
(unsafe-fl+ 1.0 z))<br>
<br>
which means that the JIT cannot avoid the box for `z'. I'm planning<br>
improvements for the bytecode compiler to help the JIT in such cases.<br>
<br>
Other directions for future improvement include allowing more<br>
`unsafe-fx' operations in unboxable combinations and adding support for<br>
reading and writing to an array of flonums without extra boxing.<br>
<br>
_________________________________________________<br>
For list-related administrative tasks:<br>
<a href="http://list.cs.brown.edu/mailman/listinfo/plt-dev" target="_blank">http://list.cs.brown.edu/mailman/listinfo/plt-dev</a><br>
</blockquote></div><br>