[plt-scheme] Unicode strings in mzscheme

From: Ian Oversby (oversby at googlemail.com)
Date: Mon Apr 23 15:22:10 EDT 2007

Hi folks,

Thanks for your responses.  I tried the snippet Matthew suggested and it
didn't work for me (occasionally it entered an infinite loop), so I've decided
to use DrScheme for this particular bit of code and see how it goes.

I think the console can handle Unicode - echo más works as you would
expect but assuming that DrScheme works out, I won't look into this at
the moment.

Cheers,

Ian

On 23/04/07, Matthew Flatt <mflatt at cs.utah.edu> wrote:
> Some clarifications:
>
> At Sun, 22 Apr 2007 14:47:09 -0400, Richard Cobbe wrote:
> > I suspect that this is an issue not with PLT's support for Unicode strings,
> > but rather with Unicode I/O.  In the case of mzscheme, AFAICT, console I/O
> > relies heavily on the Unicode capabilities of the console in which MzScheme
> > is running.
>
> MzScheme communicates with the world in UTF-8. A reasonable console
> uses the current locale's encoding, which may not be UTF-8.
>
> Probably it would be better for MzScheme to use the current locale's
> encoding for stdin, stdout, and stderr when they're connected to a tty.
> We haven't yet tried this, mostly due to lack of demand (relative to
> lots of other things).
>
> For now, putting something like
>
>  (when (terminal-port? (current-output-port))
>   ;; reencode-output-port is from (lib "port.ss"):
>   (current-output-port (reencode-output-port (current-output-port) "")))
>
> in your ".mzschemerc" usually works under Unix. There is an issue with
> flushing output on exit, though, since MzScheme flushes only the
> original ports before exiting.
>
>
> In the case of files, the right answer is less clear to me. Using UTF-8
> everywhere means that we avoid all sorts of problems where a file works
> on one machine and not on another. But many programs and libraries use
> the current locale's encoding by default for files.
>
>
> > Unfortunately, I can't translate this to Windows.  It wouldn't surprise me
> > to learn that cmd.exe, or the graphical window that sits on top of that,
> > can't handle Unicode I/O.  But I don't know how to get a terminal that
> > does.
>
> In Windows, there is a notion of a current code page, which is
> essentially the same as having a locale with an implied encoding.
> Currently, however, MzScheme always pretends that the default locale's
> encoding is UTF-8 under Windows. So, the above re-encode operation
> doesn't help: it re-encodes UTF-8 to UTF-8.
>
> For now, you can force MzScheme to put a particular encoding under
> Windows by setting `current-locale':
>
>   (current-locale "en_US.CP437") ; CP 437 is Latin-1, I think
>
> That is, on my Windows machine,
>
>   (current-locale "en_US.CP437")
>   (require (lib "port.ss"))
>   (current-output-port (reencode-output-port (current-output-port) "")))
>   #\uE1
>
> shows #\á instead of #\<garbage> for the last result.
>
> Matthew
>
> _________________________________________________
>   For list-related administrative tasks:
>   http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>


Posted on the users mailing list.