[plt-scheme] Red black trees in PLT Scheme

From: Andrew Reilly (andrew-scheme at areilly.bpc-users.org)
Date: Fri Jul 24 00:52:18 EDT 2009

On Wed, Jul 22, 2009 at 09:18:10AM -0400, Shriram Krishnamurthi wrote:
> If "code size" and "elegance" are your *primary* criteria, then
> Okasaki's paper explains very clearly why his version will win hands
> down.  (It is a truly beautiful presentation, and his reduction of all
> four invariant-violating cases to one has a "God's Book" feel to it.)

That is indeed a beautiful paper.  Thanks for the link!

I was so inspired by it (I've never been able to bother myself
to properly understand red-black trees before) that I dove
straight in and wrote myself a scheme version.  Thanks to
"match" it even looks pretty similar to the Hascall version in
the paper, and is about as long...

I have been using a hash-based set implementation in my code
for a while.  It works beautifully in all respects except one:
when I get my set-elements back as a list, the order changes
from run to run.  This is not "wrong" per se, but I've found
that producing different output given the same input results in
raised eyebrows and distrust from colleagues.

I've countered that with a list-based implementation which works
beautifully too, but of course has that undesirable O(N^2) (or
worse) behaviour for many sorts of operations.

So I thought that an r-b tree version might give me better
lookup (set membership test) time than the list version while
having some sort of known order... but it doesn't.  I need a <
operator that works between elements, and if I use
(lambda (x y) (< (eq-hash-code x) (eq-hash-code y)) 
as a comparison operation I'll wind up with the same sort of
randomness as the hash table version, I suspect.  Clearly I'm
going to have to put an extra, deterministic "ID" tag into
my "node" structures if I want to have some sort of stable
ordering.  Oh, well.

What I wonder, though, is: will this work at all?  Is the
eq-hash-code of a struct stable across garbage collections?  I
had assumed that eq-hash-code was somehow derived from the
memory address of the argument object, but the 3m version of
PLT uses a copying collector, I believe, so object locations can
change over time.  That could result in the "order" in my r-b
tree becoming wrong (and searches failing) over time.

How is this handelled with hash-tables?  I assume that they
ultimately use numbers derived from eq-hash-code for vector
indexes...

Cheers,

-- 
Andrew


Posted on the users mailing list.