[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Alex Shinn (foof at synthcode.com)
Date: Sat Feb 26 15:10:04 EST 2005

At 25 Feb 2005 14:40:34 -0500, Jim Blandy wrote:
> 
> Alex Shinn <foof at synthcode.com> writes:
> > 
> > Intuitively the port has a character encoding that takes effect when
> > you perform character-level operations, and is ignored when you
> > perform binary operations.  It may be hard to implement, but not to
> > use.
> 
> It's hard to implement, and to use.  In order to use a facility
> properly, you need to be able to distinguish the properties it happens
> to have as you do your development from the properties the designers
> promise it will always have.  I've argued that there are too few
> properties one can guarantee without making restrictive assumptions
> about the encodings at hand.
> 
> But you try it.

I have.  SRFI-56 was based on a binary I/O library I had been using
for some time, and had implemented many applications in pure Scheme
including a web server, web browser, mail client, x86 assembler, and
gettext replacement, all of which mix binary and textual data and all
(with the exception of the assembler) being fully internationalized
supporting format specified character encodings.

I'm honestly puzzled as to what could be hard to use about it.  I
consider the C model downright painful to use.

One thing to keep in mind is that we're talking about a binary
vs. character distinction in Scheme whereas C has a byte vs. wide
distinction.  C's "binary" data freely allows you to use character
arrays which form the basis of most C string libraries.  I'd rather
not have two disjoint string types and the corresponding duplication
of libraries.

> > Many network protocols mix byte and character data, including HTTP and
> > FTP.  To read a response you need to parse in terms of lines of
> > characters, and then possibly switch to binary operations for the body
> > of the response.
> 
> Did you actually check?

Checked and implemented.

> RFC 2047 <http://www.ietf.org/rfc/rfc2047.txt> is the spec for using
> non-ASCII characters in message headers.

Actually I said "body" of the response.  Did you read the section on
chunked encoding?  A good summary of the difficulties was referenced
in the SRFI-56 discussion:

  http://www.haskell.org/pipermail/haskell-cafe/2004-September/006801.html

-- 
Alex



Posted on the users mailing list.