When I first ran with vectors of 8192, I got exactly the opposite - the radix-2 version was much slower (although still < 500ms). But, when I looked, the longer time was almost exclusively GC - it just happen to hit that particular call. When I put a (collect-garbage) before each call, they were all < 50ms on my machine. So, the GC penalty is likely what you're seeing. The other thing is to make sure you aren't calling one of the dft routines instead of the fft routines. Those are the only ones that get anywhere near 24s on my machine - they are about 20s on my machine. The dft routines are the 'straight' mathematical discrete Fourier transform and do not use the FFT algorithms at all. [They're really only useful for a sanity check on results.] <div class="gmail_quote">On Thu, Oct 14, 2010 at 4:34 PM, John Clements <<a href="mailto:clements@brinckerhoff.org">clements@brinckerhoff.org</a>> wrote: <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> On a vector of length 8192 (a power of 2, natch), fft-complex-radix2-forward takes about 1/8 of a second (lost in the noise, essentially), but fft-complex-forward takes more than 24 seconds, a difference of about 200x. They do produce the same answers, up to differences of 1e-4 in the magnitude (didn't check the phase). 1) Is this expected? I thought the non-radix2 one was still fairly clever about subdividing when the number of points is divisible by 2. 2) If so, would it make sense to test for powers of 2? John Clements </blockquote></div>