[racket] performance problem in math/matrix
>
> I just did that. Here are the types:
>
> real-matrix* : (Array Real) (Array Real) -> (Array Real)
>
> flonum-matrix* : (Array Flonum) (Array Flonum) -> (Array Flonum)
>
> flmatrix* : FlArray FlArray -> FlArray
>
> Results so far, measured in DrRacket with debugging off:
>
> Function Size Time
> -----------------------------------------
> matrix* 100x100 340ms
> real-matrix* 100x100 40ms
> flonum-matrix* 100x100 10ms
> flmatrix* 100x100 6ms
>
> matrix* 1000x1000 418000ms
> real-matrix* 1000x1000 76000ms
> flonum-matrix* 1000x1000 7000ms
> flmatrix* 1000x1000 4900ms
>
> The only difference between `real-matrix*' and `flonum-matrix*' is that the former uses `+' and `*' and the latter uses `fl+' and `fl*'. But if I can inline `real-matrix*', TR's optimizer will change the former to the latter, making `flonum-matrix*' unnecessary. (FWIW, this would be the largest speedup TR's optimizer will have ever shown me.)
>
> It looks like the biggest speedup comes from doing only flonum ops in the inner loop sum, which keeps all the intermediate flonums unboxed (i.e. not heap-allocated or later garbage-collected).
>
> Right now, `flmatrix*' is implemented a bit stupidly, so I could speed it up further. I won't yet, because I haven't settled on a type for matrices of unboxed flonums. The type has to work with LAPACK if it's installed, which `FlArray' doesn't do because its data layout is row-major and LAPACK expects column-major.
>
> I'll change `matrix*' to look like `real-matrix*'. It won't give the very best performance, but it's a 60x speedup for 1000x1000 matrices.
>
These results look very promising, esp. if , as you mentioned, in the end the real-matrix* will automatically reach the flonum-matrix* performance for Flonums and the flmatrix* automatically switches to a LAPACK based variant, when available. For the latter it would be great if one could even change the used library to, e.g., redirect to a installation of the highly efficient MKL library from Intel.
Looking forward to benchmark it against Numpy and Mathematica (which is MKL based) again!
Berthold