[plt-scheme] immutable strings vs. uninterned symbols
Matthew Flatt writes:
> At Tue, 6 Jun 2006 08:52:07 -0400, Doug Orleans wrote:
> > What's the difference between immutable strings and uninterned
> > symbols?
>
> Besides the printing and reading conventions, immutable strings support
> `string-ref' to access individual characters.
>
> Matthias points out that strings support `string-append', too.
As Carl pointed out, this is just a matter of library support. You
can easily make symbol-ref and symbol-append. But as you both point
out, the real question is performance: string-ref is constant in time
and space, but I think symbol-ref is linear in both, since
symbol->string has to copy the whole string. (Would it be possible to
make a symbol->immutable-string that was constant time?)
Now that I think about it, though, the only reason ever to use symbols
(especially uninterned symbols) is for performance, so you can use eq?
rather than equal?, right?
Oh wait, I just discovered the most important answer to my question:
uninterned symbols are never equal? (unless they're eq?).
> (define foo "foo")
> (equal? (string->uninterned-symbol foo) (string->uninterned-symbol foo))
#f
> A minor complaint is that `string-append' produces mutable strings,
> not immutable strings.
Same with substring, string-copy, and all of SRFI 13.
> > I mean, I realize they're different types, but they're both
> > strings that can't be changed and are only eq? to themselves.
>
> In this way, aren't integers (including bignums) also like immutable
> strings and uninterned symbols? The difference is that the internal
> representation of integers is optimized for operations like `+' instead
> of operations like `string-ref'.
You're right, you could encode a string as an integer. I could
imagine there would be some applications where this would make sense.
In particular, equal strings would be eqv? (and =).
> > Is there any reason to prefer one over the other?
>
> If you want `string-ref', then uninterned strings are better (just like
> you'd use integers if you want `+').
>
> Offhand, I can't think of any other reason. I'm inclined to avoid
> uninterned symbols, though, probably because I expect symbols to be
> interned (more than I expect strings to be mutable).
And if the main reason to use symbols is performance, you might as
well use interned symbols. But this makes me wonder: does a symbol
take up more space than a string? Obviously it takes less space than
two strings, but if I have a string that I never expect to appear more
than once, is it a waste to make it a symbol?
> > Which one is more like java.lang.String?
>
> I think immutable strings are a little close to java.lang.String.
I was thinking about String.intern(), but because of the equal? thing
I think you're right. Except that literal strings are interned in
Java, so "foo" == "foo" is always true but (eq? "foo" "foo") is usually
false. (It's unspecified by R5RS, but is it guaranteed by MzScheme?)
--dougorleans at gmail.com