UNCLE! (was Re: eq and hashing (was Re: [plt-scheme] How to make unit functors?))

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu May 29 17:01:27 EDT 2003

At Thu, 29 May 2003 16:23:24 -0400, "Joe Marshall" wrote:
> From: "Matthew Flatt" <mflatt at cs.utah.edu>
> > The bottom line is that `eq-hash-code' isn't always the right thing,
> > just as `eq?' isn't always the right thing. But `eq?' is sometimes
> > sufficient, and so is `eq-hash-code'.
> 
> It seems bizarre that EQ?, which is the right thing for (interned) symbols
> doesn't play well with EQ-HASH-CODE when you use them together
> on interned symbols.

Ok, already. I yield!

I started a reply along the lines of "yes, eq-hash-code implies a more
detailed model of computation, and the details are a little surprising,
but implementing the alternative is too expensive".

In trying to explain why it was too expansive, though, I discovered
that it isn't, at least not in the current implementation.

 * The only problem is with symbols, where programmers expect 'a to be
   a direct reference to the platonic "a", not the "a" that lives
   during such-and-such a period.

 * Computing a content-based hash would be too expensive in principle
   --- O(n) for a symbol of size n --- but it's ok if the hash is
   computed once and then stored with the symbol for future use. (I.e.,
   the cost becomes part of the cost of interning.)

 * 3m already solves the hash-storing problem.

 * The 3m solution wouldn't necessarily translate to non-3m without
   cost, since 3m representations often need an extra few bits (and
   this cost offset by better memory management in 3m).

 * But the relevant bits are, in fact, sitting unused in the non-3m
   reprsentation of symbols.

 * Even so, making the primitive hash table implementation consult
   those bits for symbols would slow down non-3m hash-table operations.

 * But there's no inherent need for the result of `eq-hash-code' to
   have anything to do with the operations of the hash table.

So there you go. I'll fix it, and there's negligible overhead.

If any future implementors complain about the cost of `eq-hash-code' on
symbols, I'll send them to you guys. :)

Matthew



Posted on the users mailing list.