[racket-dev] cross-module function inlining

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu Dec 1 09:49:44 EST 2011

The bytecode compiler now supports cross-module inlining of functions.
As a result, for example, `empty?' and `cons?' should now perform just
as well as `null?' and `pair?'.

To avoid expanding bytecode too much, the compiler is especially
conservative about which functions it chooses as candidates for
cross-module inlining. For now, the function body must be very small
--- roughly, less than 8+N expressions for a function of N arguments.

Based on that size constraint, the compiler would not automatically
determine that `map', `for-each', `andmap' and `ormap' are good
candidates for inlining. Those functions have been annotated to
encourage the compiler to make them candidates for inlining, anyway.
You can similarly annotate your own functions using the pattern

 (define-values (<id>) 
    (begin 
      'compiler-hint:cross-module-inline 
      <proc-expr>))

Yes, this pattern is a hack; I don't have a better idea for the
annotation at the moment, but it may change.

Given an imported function that is a candidate for inlining, the usual
heuristics apply at a call site to determine whether the function is
actually inlined. The heuristics should invariably allow functions like
`empty?' to be inlined, but `map' may or may not be inlined for a given
use --- depending, for example, on how much inlining has already
happened at the call site.


I have not yet found any useful programs that benefit immediately from
this improvement. (Some traditional Scheme benchmarks benefit from
inlining `map', of course.) The benefits are probably down the road, as
various little parts of Racket shift to take advantage of the
improvement.


As always, you can use `raco decompile' to see whether a given function
call was inlined. To check whether the compiler made a particular
exported function a candidate for inlining, look for

  (define-values (<id>)
      (begin
        '%%inline-variant%%
        <proc-1>
        <proc-2>))

in decompiled output; the '%%inline-variant%% pattern reports that <id>
is a candidate for inlining, and <proc-1> is the variant of the
function that is used for inlining, while <proc-2> is the normal
variant of the function. (The <proc-1> and <proc-2> code may be the
same, or `<proc-1> may be less optimized in ways that keep its code
smaller and easier to inline.)


The current implementation of cross-module function inlining is just a
first cut. If you try it and don't get the kind of inlining that you
want or expect, let me know, and we can see whether improvements are in
order.

As an example, given the definitions

  (define (f x) <something-big>)
  (define (g y) (f y))

a call to the `g' function not actually be inlined, even though `g' is
considered a candidate for inlining. The inliner doesn't currently know
how to move the reference to `f' into a different module when inlining
`g'. This limitation isn't difficult to fix, I think, but it hasn't
come up in the examples that I looked at, so I haven't tried to fix it.



Posted on the dev mailing list.