[plt-dev] symbol->string and mutability

From: Carl Eastlund (cce at ccs.neu.edu)
Date: Thu Jun 18 16:27:02 EDT 2009

On Thu, Jun 18, 2009 at 4:03 PM, Matthew Flatt<mflatt at cs.utah.edu> wrote:
>>
>> I see.  I have contracts set up to accept only symbols and keywords
>> whose names are ASCII strings; I was planning to use a weak, eq?-based
>> hash of their names to shortcut the test.  Apparently, though, I
>> cannot get eq?-unique names for symbols and strings.  If I hash the
>> symbols and keywords themselves, I believe the weak table can never
>> reclaim the space (since interned symbols and keywords are forgeable);
>
> No --- symbols and keywords are GCed, so a weak hash table would work.
>
> (And weakness in hash tables isn't about whether you could synthesize
> the key. We have `equal?'-based hash tables with weak keys, after all.)

I see.  I tried to demonstrate this one way or another to myself with
a weak box containing a symbol and a call to collect-garbage, and the
box never "emptied".  Perhaps that experiment was not conclusive;
maybe collect-garbage doesn't guarantee emptying of all weak boxes, or
I had a reference to the symbol lying around somewhere I didn't know
about.

>> However, while I'm musing out loud... would it be possible to have
>> symbol->bytes and keyword->bytes that produce the UTF-8 representation
>> (presumably with guarantees of uniqueness, immutability, and proper
>> UTF-8 encoding)?
>
> Do you mean that `symbol->bytes' would avoid allocation, which is
> possible because the symbol is UTF-8 encoded?
>
> If so, there's another part of the representation story that I left out
> last time. A symbol's content is "inlined" into the allocated symbol
> record, while a string or a byte string is a record containing a
> pointer to the string's characters. This difference has to do with C
> interoperability and a GC-based prohibition on pointers into the
> interior of an allocated object. So, there are many ways in which the
> current representations don't yield a cheap `symbol->bytes' operation.

I figured there might be something like that going on.  Not a major
issue, I will get along without this.

--Carl


Posted on the dev mailing list.