[racket] performance problem in math/matrix

From: Berthold Bäuml (berthold.baeuml at dlr.de)
Date: Mon Jan 21 18:33:39 EST 2013

> 
> I just did that. Here are the types:
> 
>  real-matrix* : (Array Real) (Array Real) -> (Array Real)
> 
>  flonum-matrix* : (Array Flonum) (Array Flonum) -> (Array Flonum)
> 
>  flmatrix* : FlArray FlArray -> FlArray
> 
> Results so far, measured in DrRacket with debugging off:
> 
> Function           Size              Time
> -----------------------------------------
> matrix*            100x100          340ms
> real-matrix*       100x100           40ms
> flonum-matrix*     100x100           10ms
> flmatrix*          100x100            6ms
> 
> matrix*           1000x1000      418000ms
> real-matrix*      1000x1000       76000ms
> flonum-matrix*    1000x1000        7000ms
> flmatrix*         1000x1000        4900ms
> 
> The only difference between `real-matrix*' and `flonum-matrix*' is that the former uses `+' and `*' and the latter uses `fl+' and `fl*'. But if I can inline `real-matrix*', TR's optimizer will change the former to the latter, making `flonum-matrix*' unnecessary. (FWIW, this would be the largest speedup TR's optimizer will have ever shown me.)
> 
> It looks like the biggest speedup comes from doing only flonum ops in the inner loop sum, which keeps all the intermediate flonums unboxed (i.e. not heap-allocated or later garbage-collected).
> 
> Right now, `flmatrix*' is implemented a bit stupidly, so I could speed it up further. I won't yet, because I haven't settled on a type for matrices of unboxed flonums. The type has to work with LAPACK if it's installed, which `FlArray' doesn't do because its data layout is row-major and LAPACK expects column-major.
> 
> I'll change `matrix*' to look like `real-matrix*'. It won't give the very best performance, but it's a 60x speedup for 1000x1000 matrices.
> 

These results look very promising, esp. if , as you mentioned, in the end the real-matrix* will automatically reach the flonum-matrix* performance for Flonums and the flmatrix* automatically switches to a LAPACK based variant, when available. For the latter it would be great if one could even change the used library to, e.g., redirect to a installation of the highly efficient MKL library from Intel.

Looking forward to benchmark it against Numpy and Mathematica (which is MKL based) again!

Berthold



Posted on the users mailing list.