[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?
Michael Sperber <sperber at informatik.uni-tuebingen.de> writes:
> >>>>> "Jim" == Jim Blandy <jimb at redhat.com> writes:
> Jim> My claim is that it's impossible to precisely specify the behavior of
> Jim> mixed byte and character reads on a port if the character encoding
> Jim> doesn't have some restrictions imposed on it. It can't be left
> Jim> completely unspecified.
>
> Sure it needs to be specified---but I don't think it needs to be
> *restricted* in unreasonable ways. Somebody needs to sit down and say
> *per encoding* (or per encoding conversion) what bytes a READ-CHAR
> will remove from the port. This happens to be easy for the various
> Unicode encodings, and that's what should guide the design.
It's obvious and unambiguous for UTF-8, and the rest of the character
encoding schemes. But any such choice for ISO-2022 will be useless in
some cases. It'll be well-defined, but not actually useful.
> Jim> 2) Amend SRFI-56 to restrict ports to be either char-only or
> Jim> byte-only.
>
> That just seems totally unacceptable to me---there are so many
> applications where byte and character data is interleaved, and where
> there aren't any semantic issues. (Specifically in the PLT media
> editors.) The sheer existence of all the SHIFT-JIS crap (which a lot
> of people, including a lot in the multi-language encoding business
> don't care about at all) shouldn't make things hard for everyone.
In other words, you're only interested in supporting encodings that
don't complicate the interface much. Is that a fair restatement of
what you're saying?
Does it concern you that the interface you've selected requires
iconv-based implementations to use iconv inefficiently?