[plt-scheme] immutable strings vs. uninterned symbols

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Jun 6 15:28:38 EDT 2006

On Jun  6, Doug Orleans wrote:
> Matthew Flatt writes:
>  > At Tue, 6 Jun 2006 08:52:07 -0400, Doug Orleans wrote:
>  > > What's the difference between immutable strings and uninterned
>  > > symbols? 
>  > 
>  > Besides the printing and reading conventions, immutable strings support
>  > `string-ref' to access individual characters.
>  > 
>  > Matthias points out that strings support `string-append', too.
> 
> As Carl pointed out, this is just a matter of library support.
> 
> You can easily make symbol-ref and symbol-append.  But as you both
> point out, the real question is performance: string-ref is constant
> in time and space, but I think symbol-ref is linear in both, since
> symbol->string has to copy the whole string.  (Would it be possible to
> make a symbol->immutable-string that was constant time?)

Much more, IMO.  It's the concept of a different type for different
uses.  Otherwise you would feel just as well in a world that uses
numbers/strings/church-encodings/goedel-numbers for everything.

Sure you have enough conversion back doors that you can write a `+'
that adds two strings that represent numbers -- but do you want to?
The fact that you need to do some extra work is like a little red
light bulb telling you that something is wrong.

In the same way you can write your own symbol-append, symbol-ref,
symbol->number, symbol<?, subsymbol, regexp-symbol-match,
open-output-symbol, read-symbol-avail!, etc, then double the whole
thing for uninterned-symbols.  But the *real* real question is do you
want to?  (Not performance at all -- the question stands even if
symbol->string is an O(1) operation.)

There are certain features of numbers and strings that makes them a
useful representation of numbers and strings.  In the case of symbols,
it's the lack of features that makes it is.  Every time you use a
symbol function from the above imaginary library, it means that you
probably should just use strings...

Sometime in the last year I had an exam question to turn a list of 'a
and 'd symbols indo a c---r function -- some people tried to
concatenate the symbol 'c, the list, and the symbol 'r, and use the
result as a function.  I later verified that these people had problems
distinguishing values and names, and that was made worse by an
illusion that a symbol is a kind of a string.  My guess is that using
strings instead would have made it much more confusing.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.