[racket] General-purpose "hash-consing"

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Tue Jul 1 00:52:46 EDT 2014

You may already know that `datum-intern-literal` is implemented
similarly, but `datum-intern-literal` hashes only certain kinds of
values. The `canonicalize` function below accepts arbitrary values, so
the `equal-hash-code` and `equal?` operations by the hash table can
call arbitrary code. Running arbitrary code in atomic mode is generally
not a good idea, and it's not allowed in the run-time system.

Also, if the `canonical-values` table were truly global within the
run-time system, then it would add a problematic communication channel
(via mutable values).

Assuming well-behaved arguments, I think that `canonicalize` needs to
remove the value from `b` before deciding whether to return that value,
otherwise the value might get GCed between the time that the weak box
is found it in the table and the time that it's content is returned.

For that second problem, the `datum-intern-literal` implementation can
"cheat", since it gets an extra level of atomicity with respect to GCs
(by restricting to certain values and being in the run-time system).

The `datum-intern-literal` function also avoids a weak-box allocation
through an internal operation to extract the key (instead of value)
from a hash-table match. I see a speed difference between
`datum-intern-literal` and `canonicalize` when adding new strings, and
I think the difference is due to allocating the extra box. Looking up a
previously registered string with `canonicalize` is just as fast,
though. I don't think `canonicalize` would get much faster by being
built-in.

At Mon, 30 Jun 2014 20:54:38 -0400, Tony Garnock-Jones wrote:
> Hi all,
> 
> I'd love any comments on the following as a general technique for
> "hash-consing" - that is, making (equal?) imply (eq?). (Corollary: two
> canonicalized values will be (eq?) iff they are (equal?).)
> 
> Is it worth considering this as something to push into the C runtime for
> speed?
> 
>   #lang racket/base
>   (provide canonicalize)
>   (require ffi/unsafe/atomic)
> 
>   (define canonical-values (make-weak-hash))
> 
>   (define (canonicalize val)
>     (start-atomic)
>     (define b (hash-ref canonical-values
>                         val
>                         (lambda ()
>                           (hash-set! canonical-values
>                                      val
>                                      (make-weak-box val))
>                           #f)))
>     (end-atomic)
>     (if b (weak-box-value b) val))
> 
> See also
> https://gist.github.com/tonyg/86d0052c9c28431e1546#file-hashcons-rkt-L9-L17,
> where I have test cases demonstrating simple uses of canonicalize and
> its interactions with the garbage collector.
> 
> Regards,
>   Tony
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users

Posted on the users mailing list.