[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Jim Blandy (jimb at redhat.com)
Date: Mon Feb 28 17:22:37 EST 2005

Michael Sperber <sperber at informatik.uni-tuebingen.de> writes:
> >>>>> "Jim" == Jim Blandy <jimb at redhat.com> writes:
> Jim> My claim is that it's impossible to precisely specify the behavior of
> Jim> mixed byte and character reads on a port if the character encoding
> Jim> doesn't have some restrictions imposed on it.  It can't be left
> Jim> completely unspecified. 
> 
> Sure it needs to be specified---but I don't think it needs to be
> *restricted* in unreasonable ways.  Somebody needs to sit down and say
> *per encoding* (or per encoding conversion) what bytes a READ-CHAR
> will remove from the port.  This happens to be easy for the various
> Unicode encodings, and that's what should guide the design.

It's obvious and unambiguous for UTF-8, and the rest of the character
encoding schemes.  But any such choice for ISO-2022 will be useless in
some cases.  It'll be well-defined, but not actually useful.

> Jim> 2) Amend SRFI-56 to restrict ports to be either char-only or
> Jim>    byte-only.
> 
> That just seems totally unacceptable to me---there are so many
> applications where byte and character data is interleaved, and where
> there aren't any semantic issues.  (Specifically in the PLT media
> editors.)  The sheer existence of all the SHIFT-JIS crap (which a lot
> of people, including a lot in the multi-language encoding business
> don't care about at all) shouldn't make things hard for everyone.

In other words, you're only interested in supporting encodings that
don't complicate the interface much.  Is that a fair restatement of
what you're saying?

Does it concern you that the interface you've selected requires
iconv-based implementations to use iconv inefficiently?



Posted on the users mailing list.