[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Jim Blandy (jimb at redhat.com)
Date: Sat Feb 19 18:40:18 EST 2005

I don't want to broaden the problem too much.  I'm not concerned with
proper bidirectional rendering behavior, input methods for Hebrew,
Japanese, etc. etc.  I think printf's use of the locale-specific
decimal point for %f, %g, etc. was a dumb choice.  MzScheme introduces
separate locale-sensitive variants of functions --- a much better
approach.  And so on.

At the moment, MzScheme does not translate, by default, from the
current locale's multi-byte character encoding for ports talking to
files, pipes, and file descriptors inherited from its parent.  When
those carry data in a form other than UTF-8, MzScheme will
misunderstand it.

The reasons I know of for ignoring the current locale's encoding are
these:

- The current locale's encoding may not match the encoding actually in
  use.

  Where this is so, all the system utilities on a POSIX system (sh;
  grep; etc.) will misbehave as well.  MzScheme will be in good
  company.  It's already the user's problem.

- Locale sensitivity makes programs' behavior less predictable.

  True.  But ignoring the locale's encoding also introduces
  unpredictability, for people who have correctly set it to something
  other than "C" or "POSIX".  Which sort of misbehavior will affect
  people more often?  (Not a rhetorical question.)

- It's hard to implement.

  I'm just questioning which of two behaviors, both of which MzScheme
  wants to support, should be the default.  If it's hard to implement,
  then those difficulties will also apply when the user knows exactly
  what encoding they've got, and is willing to explicitly request it.
  From looking at the docs, MzScheme already wants to support this.

- It's not worth it.  Everyone should use Unicode code points and
  UTF-8.

  I agree.  But reasonable and knowledgeable people have told me that
  locale encodings are useful to them.

- I'm typing, I'm typing!

  All right! :)



Posted on the users mailing list.