[racket-dev] better x86 performance

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Sun Apr 24 21:00:59 EDT 2011

On Sun, Apr 24, 2011 at 7:56 PM, Eli Barzilay <eli at barzilay.org> wrote:
> An hour and a half ago, Matthew Flatt wrote:
>> [...] Later, the `ret' to return from the non-tail call would
>> confuse the processor and caused stalls, because the `ret' it wasn't
>> matched with its `call'.  It's easy enough to put the return address
>> in place using `call' when setting up a frame, which exposes the
>> right nesting to the processor.
> Does this mean that the code was correct, only it followed a pattern
> that is not commonly produced by most compilers?

Yes, except that the issue here is branch (jump) prediction not so
much the fact that compilers commonly produce call/ret pairs. That is,
the processor can do a much better job of keeping things running fast
when it can predict which instruction is going to come after the
current one (often it wants to actually predict 5 or 10 or something
ahead). It uses this for pipelining, but also some processors can even
have two pipelines running in parallel, using additional hardware that
predict which instruction's results will be needed by which other ones
to keep the parallelism going.


Posted on the dev mailing list.