[racket] performance problem in math/matrix

From: Berthold Bäuml (berthold.baeuml at dlr.de)
Date: Mon Jan 21 18:33:39 EST 2013

Previous message: [racket] performance problem in math/matrix
Next message: [racket] performance problem in math/matrix
Messages sorted by: [date] [thread] [subject] [author]

> 
> I just did that. Here are the types:
> 
>  real-matrix* : (Array Real) (Array Real) -> (Array Real)
> 
>  flonum-matrix* : (Array Flonum) (Array Flonum) -> (Array Flonum)
> 
>  flmatrix* : FlArray FlArray -> FlArray
> 
> Results so far, measured in DrRacket with debugging off:
> 
> Function           Size              Time
> -----------------------------------------
> matrix*            100x100          340ms
> real-matrix*       100x100           40ms
> flonum-matrix*     100x100           10ms
> flmatrix*          100x100            6ms
> 
> matrix*           1000x1000      418000ms
> real-matrix*      1000x1000       76000ms
> flonum-matrix*    1000x1000        7000ms
> flmatrix*         1000x1000        4900ms
> 
> The only difference between `real-matrix*' and `flonum-matrix*' is that the former uses `+' and `*' and the latter uses `fl+' and `fl*'. But if I can inline `real-matrix*', TR's optimizer will change the former to the latter, making `flonum-matrix*' unnecessary. (FWIW, this would be the largest speedup TR's optimizer will have ever shown me.)
> 
> It looks like the biggest speedup comes from doing only flonum ops in the inner loop sum, which keeps all the intermediate flonums unboxed (i.e. not heap-allocated or later garbage-collected).
> 
> Right now, `flmatrix*' is implemented a bit stupidly, so I could speed it up further. I won't yet, because I haven't settled on a type for matrices of unboxed flonums. The type has to work with LAPACK if it's installed, which `FlArray' doesn't do because its data layout is row-major and LAPACK expects column-major.
> 
> I'll change `matrix*' to look like `real-matrix*'. It won't give the very best performance, but it's a 60x speedup for 1000x1000 matrices.
> 

These results look very promising, esp. if , as you mentioned, in the end the real-matrix* will automatically reach the flonum-matrix* performance for Flonums and the flmatrix* automatically switches to a LAPACK based variant, when available. For the latter it would be great if one could even change the used library to, e.g., redirect to a installation of the highly efficient MKL library from Intel.

Looking forward to benchmark it against Numpy and Mathematica (which is MKL based) again!

Berthold

Posted on the users mailing list.

Previous message: [racket] performance problem in math/matrix
Next message: [racket] performance problem in math/matrix
Messages sorted by: [date] [thread] [subject] [author]