[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?
I don't want to broaden the problem too much. I'm not concerned with
proper bidirectional rendering behavior, input methods for Hebrew,
Japanese, etc. etc. I think printf's use of the locale-specific
decimal point for %f, %g, etc. was a dumb choice. MzScheme introduces
separate locale-sensitive variants of functions --- a much better
approach. And so on.
At the moment, MzScheme does not translate, by default, from the
current locale's multi-byte character encoding for ports talking to
files, pipes, and file descriptors inherited from its parent. When
those carry data in a form other than UTF-8, MzScheme will
misunderstand it.
The reasons I know of for ignoring the current locale's encoding are
these:
- The current locale's encoding may not match the encoding actually in
use.
Where this is so, all the system utilities on a POSIX system (sh;
grep; etc.) will misbehave as well. MzScheme will be in good
company. It's already the user's problem.
- Locale sensitivity makes programs' behavior less predictable.
True. But ignoring the locale's encoding also introduces
unpredictability, for people who have correctly set it to something
other than "C" or "POSIX". Which sort of misbehavior will affect
people more often? (Not a rhetorical question.)
- It's hard to implement.
I'm just questioning which of two behaviors, both of which MzScheme
wants to support, should be the default. If it's hard to implement,
then those difficulties will also apply when the user knows exactly
what encoding they've got, and is willing to explicitly request it.
From looking at the docs, MzScheme already wants to support this.
- It's not worth it. Everyone should use Unicode code points and
UTF-8.
I agree. But reasonable and knowledgeable people have told me that
locale encodings are useful to them.
- I'm typing, I'm typing!
All right! :)