[racket-dev] plausible hash function for s16vectors
I want to equip my rsounds with a gen:equal+hash implementation, so that I can compare them using equal? and thereby use them in check-expect test cases. I have to roll this myself, since s16vectors don't do non-eq? equal?. My question: is there an accepted hash function for an s16vector, or more generally, for a big block of memory? These vectors are relatively unlikely to be sparse--that is, have lots of zeros--so it seems like it would be reasonable just to take three or four values (in this case s16s) and just XOR them together.
Taking a look at the behavior of vectors, though, it looks like *every* element is considered in computing the hash. This code:
#lang racket
(equal-hash-code (make-vector 100000 0))
(define v (make-vector 100000 0))
(vector-set! v 7982 1)
(equal-hash-code v)
(vector-set! v 7182 1)
(equal-hash-code v)
(vector-set! v 17982 1)
(equal-hash-code v)
(vector-set! v 55089 1)
(equal-hash-code v)
produces this output:
-1113903107084523788
369193263530696121
555229980507179346
2232258794360948312
-1281429753928160859
Which suggests that every change to the vector changes the result of the hash function. This seems... really expensive! My current guess is that Racket uses a highly optimized (a.k.a. no safety checks) hash function that works over arbitrary blocks of data, but that the hash computation is still linear in the size of the data. I haven't tried it, but I would guess that if I wrote my own looks-at-every-value hash function, it would be a lot slower. Questions:
1) Am I guessing right?
2) Is this documented somewhere?
3) Is there a generic memory-hash function in the unsafe interface somewhere?
4) Does the hash function affect the time taken by 'equal?' -- i.e., the hash value is cached for faster equal? checking ?
Thanks!
John