[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Alex Shinn (foof at synthcode.com)
Date: Sun Feb 27 04:57:28 EST 2005

At 26 Feb 2005 18:26:18 -0500, Jim Blandy wrote:
> 
> RFC 2616 section 3.6.1 defines the "Chunked Transfer Coding".  It says
> that the chunk length is specified in octets.  So this is another case
> where the containing protocol provides information allowing one to
> identify the bytes that represent characters in some encoding, without
> parsing the characters.

The length is a hex string, the chunk data itself is binary.
A simplified decoder might look like:

  (define (read-chunked-data port)
    (let lp ((res '()))
      (let ((line (read-line port)))
        (if (eof-object? line)
          (block-concatenate-reverse res)
          (lp (cons (read-block (number->string line 16) port) res))))))

As you can see, we need to interleave character-port operations
(read-line) and binary operations (read-block).  In C this is *all*
binary, which isn't a problem because there are more functions and
libraries to treat char* data as strings and parse them than there are
to handle wchar* data as strings.

To be clear, are you advocating the use of two disjoint string types
in Scheme as in C?

Layering and/or procedural ports is a nice approach, and one I would
like to see in a future SRFI, but is higher-level than I wanted to get
into with SRFI-56.  Given SRFI-56 you can implement layering (and
given layering you can implement SRFI-56) so it seems logical to start
with the lowest-level approach first.  People are then free to come up
with competing layering approaches.

-- 
Alex



Posted on the users mailing list.