[racket-dev] Immediate characters

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Sun May 12 13:38:23 EDT 2013

My branch with immediately represented characters is available at:

https://github.com/97jaz/racket/tree/immediate-chars

I'm interested to hear any opinions from the members of this list on
the implementation.

For my own part, I'm ambivalent on the matter of actually
incorporating this work into the official tree. Benchmarks that are
heavy on character manipulation -- and there are few enough of these
-- benefit from this change, to varying degrees.

A few micro-benchmarks

1. Constructing a list of 10,000,000 characters using integer->char on
integers chosen randomly from [0, 256) (average of 5 runs, CPU time):

immediate-chars: 4248.8
Racket v5.3.4.10: 4297.2

2. Constructing a list of 10,000,000 characters using integer->char on
integers chosen randomly from the entire field of valid Unicode code
points (average of 5 runs, CPU time):

immediate-chars: 4441.4
Racket v5.3.4.10: 5953.0

3. The 'wc' shootout benchmark:

immediate-chars: 3789
Racket v5.3.4.10: 4155


Unsurprisingly, the difference between the two is more noticeable when
a lot of characters outside of the first 256 are being used.


The downside of this change is that a (Scheme_Object *) is now one of
three things, rather than one of two. This sometimes requires an
additional bit test in the interpreter and in the JIT.

At any rate, I'm interested to know what people think.

-Jon

Posted on the dev mailing list.