[racket-dev] An implementation of 'case' based on Clinger's "Rapid Case Dispatch in Scheme"

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Sat Jul 21 23:25:40 EDT 2012

I implemented a version of 'case' following Clinger's design from
"Rapid Case Dispatch in Scheme"
[http://scheme2006.cs.uchicago.edu/07-clinger.pdf]. (This came up a
while back on the users list:
In my case, I've been writing a virtual machine, and the opcode
dispatch could benefit from a faster 'case.')

On Clinger's micro-benchmarks at
[http://www.ccs.neu.edu/home/will/Research/SW2006/case.sch], my
implementation performs the same as racket's current 'case' on small
expressions and outperforms it as the expressions get bigger.

So, for example, on the f10 and s10 benchmarks (testing case
expressions with 10 fixnum and 10 symbol constants, respectively),
both implementations run in 6-7 milliseconds on my machine, whereas on
f1000, my implementation runs in 63-67 milliseconds while the current
racket implementation takes about 3 seconds.

I had previously sent a pull request on the racket github site, but
Eli mentioned that it would be better to discuss the matter here
first. In that pull request, I mentioned a few caveats, the first of
which is that my implementation tends to produce larger code. As a
data point: the total size of all the .zos in the collects tree is
102485877 when compiled with my implementation and 102272930 compiled
with the current one, so the difference represents about 0.2% of the
current size.

A more serious issue, in my view, is that the new implementation
appears to slow down the full build (raco setup) by about 7-15 seconds
on my machine. Since the full build takes a long time to complete,
I've been trying to use smaller benchmarks to find out what, exactly,
is slower. I could definitely use some help with this; I think I've
barked up several wrong trees.

For example, at one point I was trying to test expansion performance like this:

   time raco expand file.rkt > /dev/null

until I realized that, in at least one case, I was testing the
pretty-printer as much as anything else.

In the github thread, Vincent mentioned the syntax.rktl tests at
collects/tests/racket/syntax.rktl. Averaging 5 runs of "time racket -f
syntax.rktl" results in:

Clinger 'case'
real 0m0.810
user 0m0.648
sys  0m0.129

Current 'case'
real 0m0.807
user 0m0.644
sys  0m0.130

So, not much difference there.

The chameneos benchmark (under
collects/tests/racket/benchmarks/shootout) was interesting. It uses
case expressions, but only small ones. The expanded code is very
similar for both implementations, but the current version seems to
perform very slightly better. However, I don't know if this is an edge
at runtime or at expand time (or compile time). I've begun to wonder
if just adding more code to the expander, whether or not it's used,
might be the issue.

I'd appreciate any help in trying to determine exactly where the
performance difference lies.

The latest version of the code is at:

The only changed files are:

Note that the current version is in the 'case-v3' branch, if you
decide to clone my git tree. (My initial version is in the master
branch, and a second version is in branch 'case-v2' if anyone cares to
compare them.


Posted on the dev mailing list.