eq and hashing (was Re: [plt-scheme] How to make unit functors?)
[warning - copied as email; you may want to reply to the list. what's
correct protocol on this list?]
Matthew Flatt <mflatt at cs.utah.edu> writes:
> Again, it sounds like you had to keep the item (such as 'a) in the
> tree, so there's no bug.
(if i understand correctly) they're using the hash to locate the
item in the tree. the item itself would only be used once you got
to the leaf (since two items could have the same hash value they'd
be located at the same leaf)
> The underlying principle here is that you're going to compare the items
> eventually (since a hash key does not uniquely determine an item), so
> the hash key will work correctly, by definition.
they only do the comparison on the subset (often empty) of items
that have the same hash code.
surely this problem also occurs with traditional hashed dictionaries
(using arrays + some kind of clash resolution), not just trees? you
have a (key, value) pair, take the hash of the key and store the
value at "that index" (with some local resolution over a restricted
subset of items that map to the same index in the table). if you
then use the key to recover the item, agin you start with the hash
code to find the index and do comparisons over a small subset.
to make this explicit, consider storing "foo" with key 'j and (for
simplicity) a hashtable so big that array indices are hash values.
at the time of storage the hash code of 'j is 42, so we put "foo" at
index 42. later, after garbage collection, we want to recover the
value associated with 'j. so we get the hashcode of 'j, which is
now 666, and don't find anything. that's not right, surely?
apologies if i've missed some scheme-specific point (i'm just
lurking here). maybe eq-hash should only be used to test
(memory-address based) equality and there are other hash functions
that should be used when using the hash in data structures?
andrew
--
http://www.acooke.org