[plt-scheme] Why do MzScheme ports not respect the locale's encoding by default?

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Mon Feb 21 16:56:17 EST 2005

At 19 Feb 2005 18:40:18 -0500, Jim Blandy wrote:
> The reasons I know of for ignoring the current locale's encoding are
> these:
> [...]
> - It's hard to implement.
> [...]
> - It's not worth it.

My current conclusion is a combination of these, with one more piece:
it doesn't seem worth investing the programming effort needed to
produce a general-purpose implementation without performance surprises.


I've added `reencode-input-port' and `rencode-output-port' to MzLib's
"port.ss", and you can uses these functions to get a port that decodes
or encodes according to the locale's encoding. These ports still work
with MzScheme's regexp matching, etc. --- more or less as you
suggested.

We could call MzScheme's current "port" construct a "proto-port",
instead, and then define a "port" to be a pair of proto-ports: one for
byte operations, and one for character operations. The char proto-port
would be obtained by using `reencode-XXX-proto-port' on the byte
proto-port, and the char proto-port could even be created lazily.
Regexp matching with a string pattern, for example, would use the char
proto-port, while regexp matching with a byte pattern would use the
byte proto-port.

The [proto-]ports created by `reencode-XXX-port' are somewhat heavy,
though, and the guarantees on the ports are not yet as good as I would
like. For example, peeking from a reencoded input port doesn't
translate into peeks of the original input port.

These problems could be resolved with more effort, and possibly with
some performance cost, but I don't think that it will matter for most
contexts where locale-sensitive encoding is important. Therefore, I'm
inclined to keep reencoding separate from the core port mechanism, and
leave the cost--benefit analysis to the next programmer.


At Fri, 18 Feb 2005 01:06:50 -0600, Alex Shinn wrote:
> I'd be very curious to see examples where delaying the port
> orientation until the first operation is useful though.

If I understand the question, then the example that I always struggle
with is `current-input-port'. The port has to be created on startup,
without knowing anything about how the program will use it.


Matthew



Posted on the users mailing list.