[plt-dev] symbol->string and mutability

From: Carl Eastlund (cce at ccs.neu.edu)
Date: Thu Jun 18 11:30:53 EDT 2009

Previous message: [plt-dev] symbol->string and mutability
Next message: [plt-dev] symbol->string and mutability
Messages sorted by: [date] [thread] [subject] [author]

On Thu, Jun 18, 2009 at 3:35 AM, Matthew Flatt<mflatt at cs.utah.edu> wrote:
> At Wed, 17 Jun 2009 20:28:10 -0400, Carl Eastlund wrote:
>> Why do symbol->string and keyword->string produce mutable strings?  In
>> so doing, they have to allocate a new string every time.  Is there any
>> way to get at an immutable string that is not allocated more than
>> once?  I would prefer that this be the default behavior; R6RS already
>> specifies that symbol->string produces an immutable string, for
>> instance.
>
> Symbols and keywords are represented internally in UTF-8, while strings
> are represented internally as UTF-32. So, there's not an obvious way to
> have `symbol->string' avoid allocation, except by either caching a
> string reference in the symbol (probably not worth the extra space,
> since most symbols are never converted) or keeping a symbol-to-string
> mapping in a hash table (which any programmer can do externally).
>
> I think it would be a good idea to switch to an immutable-string
> result, but considering potential incompatibility, it has never seemed
> worthwhile in the short run.

I see.  I have contracts set up to accept only symbols and keywords
whose names are ASCII strings; I was planning to use a weak, eq?-based
hash of their names to shortcut the test.  Apparently, though, I
cannot get eq?-unique names for symbols and strings.  If I hash the
symbols and keywords themselves, I believe the weak table can never
reclaim the space (since interned symbols and keywords are forgeable);
if I use an equal? hash, it defeats the purpose.  In the end, this is
probably premature optimization; symbol and keyword names are usually
short, so I can just use an unhashed check.

However, while I'm musing out loud... would it be possible to have
symbol->bytes and keyword->bytes that produce the UTF-8 representation
(presumably with guarantees of uniqueness, immutability, and proper
UTF-8 encoding)?

--Carl

Posted on the dev mailing list.

Previous message: [plt-dev] symbol->string and mutability
Next message: [plt-dev] symbol->string and mutability
Messages sorted by: [date] [thread] [subject] [author]