[plt-scheme] Encodings

From: Sylvain Beucler (beuc at beuc.net)
Date: Mon May 31 04:40:32 EDT 2004

Hello,

I studied the documentation and the new mzscheme. The more precise  
question I am asking myself is how to make Scheme code containing  
accents work in v2xx and v3xx. Ok, it is still the same question, but  
here are the details :)

For example, if I write in test.scm:
(display "Je m'écrie:")(newline)
encoded in Latin-1, 'mzscheme -r test.scm' v2xx produces an output in  
Latin-1:
Je m'écrie:
More important, DrScheme will also produce a Latin-1 output.
The output will be printed well only in terminals+shells using the  
Latin-1 encoding, and will always be printed well in DrScheme.

'mzscheme -r test.scm' v3xx produces:
Je m'?crie:
since it assumes the text was UTF-8, and 'é' is not a valid UTF-8  
sequence. The output will never be printed well in any locale.
If the forthcoming DrScheme has the same behavior, the text will also  
not be printed well.

So, in the current situation, I have to encode all my source code in  
Latin-1, and convert them to UTF-8 when v300 will be released.


Also, if I convert my source file in UTF-8, the output is always UTF-8,  
so a system configured eg with LANG=fr_FR will see the output:
Je m'écrie:
ie UTF-8 interpreted as Latin-1 by the shell/terminal.
DrScheme should not have this problem since it will surely use UTF-8  
everywhere.


I am looking for a solution where:
- Latin-1 code could be read well in MzScheme/DrScheme v3xx, so that my  
code works in both v2xx and v3xx,
- output is locale dependant for MzScheme,
ie like Jikes+Java, where the source file encoding is guessed, strings  
are stored in UCS-2 in the .class, and output is converted to the  
current locale's encoding.
Beware that Javac has a wrong behavior, and assumes all source files  
are encoded using the current locale, which may prevent people from  
compiling other people's code.

-- 
Sylvain


Matthew Flatt wrote:
> At Sat, 22 May 2004 16:45:10 +0200, Sylvain Beucler wrote:
> > I would like to test the Unicode support, so I precise my question.
> > Is it tagged, or should I checkout the HEAD?
> 
> It's tagged as "v299".
> 
> Only MzScheme, MrEd, and a few assorted applications (e.g., SirMail,
> Slideshow, Games) work so far --- not DrScheme or Help Desk. Also,  
> you have to get docs from a temporary location:
> 
>   http://www.cs.utah.edu/~mflatt/tmp/mzscheme-doc.plt
>   http://www.cs.utah.edu/~mflatt/tmp/mzlib-doc.plt
>   http://www.cs.utah.edu/~mflatt/tmp/mred-doc.plt
>   http://www.cs.utah.edu/~mflatt/tmp/insidemz-doc.plt


Posted on the users mailing list.