[plt-dev] symbol->string and mutability
Just curious, but why the different representations? Is it because you
don't need to be able to index into a symbol and thus utf-8's
(usually) more compact representation is a win but for strings, where
you do need to index into it, a simple computation (and avoiding
searching?) makes UTF-32 the right choice?
Robby
On Thu, Jun 18, 2009 at 2:35 AM, Matthew Flatt<mflatt at cs.utah.edu> wrote:
> At Wed, 17 Jun 2009 20:28:10 -0400, Carl Eastlund wrote:
>> Why do symbol->string and keyword->string produce mutable strings? In
>> so doing, they have to allocate a new string every time. Is there any
>> way to get at an immutable string that is not allocated more than
>> once? I would prefer that this be the default behavior; R6RS already
>> specifies that symbol->string produces an immutable string, for
>> instance.
>
> Symbols and keywords are represented internally in UTF-8, while strings
> are represented internally as UTF-32. So, there's not an obvious way to
> have `symbol->string' avoid allocation, except by either caching a
> string reference in the symbol (probably not worth the extra space,
> since most symbols are never converted) or keeping a symbol-to-string
> mapping in a hash table (which any programmer can do externally).
>
> I think it would be a good idea to switch to an immutable-string
> result, but considering potential incompatibility, it has never seemed
> worthwhile in the short run.
>
> _________________________________________________
> For list-related administrative tasks:
> http://list.cs.brown.edu/mailman/listinfo/plt-dev
>