[plt-scheme] eq-hash-code

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Tue Oct 7 22:22:22 EDT 2008

At Tue, 7 Oct 2008 16:41:58 -0700, "Mark Engelberg" wrote:
> Just out of curiosity, what are the odds that two structs which are
> not eq? return the same eq-hash-code?

On a 32-bit platform, using 3m, the current implementation uses 24 bits
in each object to keep a hash code, or 22 bits under Windows. On a
64-bit platform, it's 55 or 57 bits. The number of bits varies because
they're essentially leftover bits in object headers, and Windows has a
different allocation granularity.

Hash codes are generated on demand, essentially by incrementing a
counter; one bit in the counter has a different use, though, so it's
essentially discarded. If you create 2^25 struct instances on a 32-bit
non-Windows machine with 3m, and if you force the hash code on each of
them, then you're likely to create 2 structs with the same hash code as
a given other struct (or 8 of them on a 32-bit Windows machine).

To confirm, on my 32-bit non-Windows machine, the program below prints

 collision at 33550335; #<a> = #<a>
 collision at 33554431; #<a> = #<a>
 collision at 67104767; #<a> = #<a>
 collision at 67108863; #<a> = #<a>
 ...


The answer is simpler with CGC. Objects don't move in memory, so the
address is used as the hash code; the bottom two bits are dropped,
since they're always 0, so the interesting part of the address fits
into a fixnum. And since two objects can't be at the same address, the
hash code is always different for any two objects (as long as neither
has been GCed).

Matthew

----------------------------------------

#lang scheme

(define-struct a ())

(let* ([a1 (make-a)]
       [h1 (eq-hash-code a1)])
  (let loop ([i 0])
    (let ([a2 (make-a)])
      (when (= (eq-hash-code a2) h1)
        (printf "collision at ~a; ~s = ~s\n" i a1 a2)))
    (loop (add1 i))))



Posted on the users mailing list.