[racket] TR req: could 'modulo' be more optimized or specialized?
Replacing `modulo' with `fxmodulo' removes the allocation for me.
Thankfully, Typed Racket can do it for you, and Optimization Coach
can help you get there.
OC reports that the upper bound on `i', `(* 10 44100)', can't be proven
to be a fixnum. If you add an explicit check (i.e.
`(assert (* 10 44100) index?)'), TR will know that `i' can only be an
`Index' and will optimize `modulo', as well as do partial bounds
checking elimination on both vector operations.
This particular course of action is not currently suggested by OC
directly, but I'll fix it. Thanks for the report!
Vincent
At Fri, 14 Sep 2012 16:59:39 -0700,
John Clements wrote:
>
> [1 <multipart/signed (7bit)>]
> [1.1 <text/plain; windows-1252 (quoted-printable)>]
> NB: you may wish to read the last sentences of this e-mail before the rest:
>
> Currently, the "modulo" function seems to perform a lot of allocation. Specifically, here's a program that copies one flvector into another one:
>
> #lang typed/racket
>
> (require racket/flonum)
>
> (define f (make-flvector (* 10 44100)))
>
> (define k (* 440.0 1/44100 2 pi))
>
> (define g (make-flvector 44100))
> (for ([i 44100])
> (define i#i (exact->inexact i))
> (flvector-set! g i (sin (* k i#i))))
>
> (time
> (for ([j 100])
> (define ctr 0)
> (for ([i (* 10 44100)])
> (flvector-set! f i
> (flvector-ref g ctr #;(modulo i 44100)))
> (set! ctr (cond [(< ctr 44099) (add1 ctr)]
> [else 0])))))
>
> It runs in only two seconds, because I'm doing nasty mutation to avoid the modulo. If you comment out the explicit counter and comment in the call to modulo, you get this:
>
> #lang typed/racket
>
> (require racket/flonum)
>
> (define f (make-flvector (* 10 44100)))
>
> (define k (* 440.0 1/44100 2 pi))
>
> (define g (make-flvector 44100))
> (for ([i 44100])
> (define i#i (exact->inexact i))
> (flvector-set! g i (sin (* k i#i))))
>
> (time
> (for ([j 100])
> (define ctr 0)
> (for ([i (* 10 44100)])
> (flvector-set! f i
> (flvector-ref g #;ctr (modulo i 44100)))
> #;(set! ctr (cond [(< ctr 44099) (add1 ctr)]
> [else 0])))))
>
> … which takes *five* seconds, including 2 seconds of GC.
>
> It certainly seems to me like modulo could do its work without a lot of gc. I'm guessing that the basic issue here is that modulo isn't usually a high-priority item in inner loops.
>
> In fact, it occurs to me that a better way to solve this is probably to write a looping (in-range/modulo …) generator.
>
> I'm going to send this mail anyhow, just in case there's low-hanging fruit here, or some kind of bug.
>
> John
>
> [1.2 smime.p7s <application/pkcs7-signature (base64)>]
>
> [2 <text/plain; us-ascii (7bit)>]
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users