[plt-scheme] immutable strings vs. uninterned symbols

From: Doug Orleans (dougorleans at gmail.com)
Date: Tue Jun 6 12:20:55 EDT 2006

Matthew Flatt writes:
 > At Tue, 6 Jun 2006 08:52:07 -0400, Doug Orleans wrote:
 > > What's the difference between immutable strings and uninterned
 > > symbols? 
 > Besides the printing and reading conventions, immutable strings support
 > `string-ref' to access individual characters.
 > Matthias points out that strings support `string-append', too.

As Carl pointed out, this is just a matter of library support.  You
can easily make symbol-ref and symbol-append.  But as you both point
out, the real question is performance: string-ref is constant in time
and space, but I think symbol-ref is linear in both, since
symbol->string has to copy the whole string.  (Would it be possible to
make a symbol->immutable-string that was constant time?)

Now that I think about it, though, the only reason ever to use symbols
(especially uninterned symbols) is for performance, so you can use eq? 
rather than equal?, right?

Oh wait, I just discovered the most important answer to my question:
uninterned symbols are never equal? (unless they're eq?).

> (define foo "foo")
> (equal? (string->uninterned-symbol foo) (string->uninterned-symbol foo))

 > A minor complaint is that `string-append' produces mutable strings,
 > not immutable strings.

Same with substring, string-copy, and all of SRFI 13.

 > > I mean, I realize they're different types, but they're both
 > > strings that can't be changed and are only eq? to themselves. 
 > In this way, aren't integers (including bignums) also like immutable
 > strings and uninterned symbols? The difference is that the internal
 > representation of integers is optimized for operations like `+' instead
 > of operations like `string-ref'.

You're right, you could encode a string as an integer.  I could
imagine there would be some applications where this would make sense.
In particular, equal strings would be eqv? (and =).

 > > Is there any reason to prefer one over the other? 
 > If you want `string-ref', then uninterned strings are better (just like
 > you'd use integers if you want `+').
 > Offhand, I can't think of any other reason. I'm inclined to avoid
 > uninterned symbols, though, probably because I expect symbols to be
 > interned (more than I expect strings to be mutable).

And if the main reason to use symbols is performance, you might as
well use interned symbols.  But this makes me wonder: does a symbol
take up more space than a string?  Obviously it takes less space than
two strings, but if I have a string that I never expect to appear more
than once, is it a waste to make it a symbol?

 > > Which one is more like java.lang.String?
 > I think immutable strings are a little close to java.lang.String.

I was thinking about String.intern(), but because of the equal? thing
I think you're right.  Except that literal strings are interned in
Java, so "foo" == "foo" is always true but (eq? "foo" "foo") is usually
false.  (It's unspecified by R5RS, but is it guaranteed by MzScheme?)

--dougorleans at gmail.com

Posted on the users mailing list.