[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Alex Shinn (foof at synthcode.com)
Date: Fri Feb 18 02:06:50 EST 2005

At 17 Feb 2005 17:17:37 -0500, Jim Blandy wrote:
> You're right that locale-sensitivity leads to all sorts of
> unpredictability and hidden portability issues.  But as I said, locale
> encodings are something the user explictly asks for.  If I understand
> POSIX correctly, the user can always say "LC_ALL=C" and turn
> everything off if that's what they want.

Perhaps I misunderstand, but I thought the LC_* POSIX settings were
meant for i18n, specifically messages to and input from the user.
That would be an argument for automatic conversion on ports connected
to ttys, which I'm all for. It certainly wouldn't make sense to try to
display hieroglyphs to a Latin-1 terminal.  But I don't think that
should necessarily apply to file or network ports.

> The only way out I see for the authors of your Japanese dictionary
> software is to write out their own code for parsing EUC-JP, and use
> that explicitly.  But now that locales exist, programmers must
> consider when it's appropriate to respect them and when they should be
> ignored.  (And I wouldn't be surprised to hear of situations where
> there's no good answer.)

The programmer should be able to open the dictionary explicitly in
EUC-JP, then simply display the text to stdout (which would convert
according to the user's locale).  It's simple and portable.

> First, a small correction: in msg00027.html, you mention the C I/O
> API.  ISO C doesn't allow you to mix character and byte operations on
> a single port.  The first operation on a port sets its orientation
> ("byte" or "wide"), which is fixed from that point onward; operations
> of the other orientation are an error.

Well, to be precise it doesn't allow you to mix wide character (wchar)
and byte (char) operations.  If you stick to char operations you're
free to mix, say, fgetc and fread.

> However, the revised text you posted in msg00064.html is more
> restrictive than ISO C.  In ISO C, the "orientation" of a port is
> determined by the first I/O operation, not before.  In the revised
> SRFI-56, it looks to me as if the port's orientation is determined
> when it is created.  Matthew mentioned that restriction as a source
> of troubles --- whether anticipated or actually experienced I don't
> know.

The current compromise is meant to be as accomodating as possible to
all systems, including Java which requires specification of the port's
orientation at creation time.  However, the SRFI is careful to leave
the issue of mixing of byte and character operations on the same port
unspecified - an implementation is free to allow this.  This weakened
stance seemed to satisfy everyone, while still allowing you to write a
great range of portable programs using binary I/O, which is likely why
the discussion died down.

I'd be very curious to see examples where delaying the port
orientation until the first operation is useful though.


Posted on the users mailing list.