[racket-dev] plausible hash function for s16vectors

From: John Clements (clements at brinckerhoff.org)
Date: Tue Nov 26 16:39:31 EST 2013

I want to equip my rsounds with a gen:equal+hash implementation, so that I can compare them using equal? and thereby use them in check-expect test cases.  I have to roll this myself, since s16vectors don't do non-eq? equal?. My question: is there an accepted hash function for an s16vector, or more generally, for a big block of memory? These vectors are relatively unlikely to be sparse--that is, have lots of zeros--so it seems like it would be reasonable just to take three or four values (in this case s16s) and just XOR them together.  

Taking a look at the behavior of vectors, though, it looks like *every* element is considered in computing the hash. This code:

#lang racket

(equal-hash-code (make-vector 100000 0))
(define v (make-vector 100000 0))
(vector-set! v 7982 1)
(equal-hash-code v)
(vector-set! v 7182 1)
(equal-hash-code v)
(vector-set! v 17982 1)
(equal-hash-code v)
(vector-set! v 55089 1)
(equal-hash-code v)

produces this output:

-1113903107084523788
369193263530696121
555229980507179346
2232258794360948312
-1281429753928160859

Which suggests that every change to the vector changes the result of the hash function. This seems... really expensive!  My current guess is that Racket uses a highly optimized (a.k.a. no safety checks) hash function that works over arbitrary blocks of data, but that the hash computation is still linear in the size of the data. I haven't tried it, but I would guess that if I wrote my own looks-at-every-value hash function, it would be a lot slower. Questions:

1) Am I guessing right?
2) Is this documented somewhere?
3) Is there a generic memory-hash function in the unsafe interface somewhere?
4) Does the hash function affect the time taken by 'equal?' -- i.e., the hash value is cached for faster equal? checking ?

Thanks!

John



Posted on the dev mailing list.