[racket] TR memory optimization: 240 Bytes of garbage for calling TR?

From: John Clements (clements at brinckerhoff.org)
Date: Sat Nov 10 10:22:14 EST 2012

I'm trying to implement some simple comb filters for a reverb, using racket and/or typed racket.  I have six of these running in parallel; each one has a vector, and each time a sample arrives, each comb needs to perform two floating-point multiplies and two floating point adds, increment a counter with possible reset, and store/mutate two locations in memory to prepare for next time.

The problem for code like this isn't runtime, directly; it's all the GC. Adding this filter to a simple playback was observed to generate an additional 1.6 GB of garbage for a 60-second session[*], which sounds like a lot until you divide by 60 seconds and the 44.1K sample rate, to get 606 bytes/sample frame. Regardless, you could definitely do it with zero garbage in C, so I set out to try to reduce this. 

I guessed that most of the garbage in this case was related to boxing of floats, so I decided to use TR to try to eliminate this. I hauled my code over to TR, and it worked completely without modification, which was a joy. Also, the optimization coach tells me that everything is green, and staying in the Float realm. Unfortunately, it didn't improve the memory use much, and after some experiments, it looks like it reduces the memory overhead per comb filter by roughly half, to 278 bytes/sample frame, *but* imposes its own fixed overhead of 240 bytes/sample frame, which pretty much negates the benefit of the reduction.

So, my question is this: should making a call from racket to this TR code

(: dummy2 (Float -> Float))
(define (dummy2 in)
  (* 0.1 in))

... generate about 240 bytes in garbage? 

FWIW, here's what a comb filter function looks like:

(: comb1 (Float -> Float))
(define (comb1 in)
  (define delayed1 (flvector-ref v1 c1))
  (define midnode1 (fl+ delayed1 (fl* g11 m1)))
  (define out1 (fl+ (fl* g21 midnode1) in))
  (flvector-set! v1 c1 out1)
  (define next-c1 (add1 c1))
  (set! c1 (cond [(<= d1 next-c1) 0]
                 [else next-c1]))
  (set! m1 midnode1)

I can't see anything in this that would cause allocation.

Maybe the next step is to take a look at the compiled bytecode....


[*] FWIW, I'm observing this by running at the command line with -W debug and then parsing the GC output that appears on the console.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4370 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20121110/d6a659ff/attachment-0001.p7s>

Posted on the users mailing list.