[plt-scheme] Encodings

From: Sylvain Beucler (beuc at beuc.net)
Date: Mon May 31 05:17:31 EDT 2004

I just figured out that gettext bindings (ie auto iconv() calls) may do  
the trick, while keeping MzScheme's simple encoding behavior.
I will perform some tests right now.

By the way, I only found gettext bindings (actually a re- 
implementation) at http://www.synthcode.com/scheme/. Do you know of any  
other bindings that can be used in PLT Scheme?

-- 
Sylvain


I wrote:
> Hello,
> 
> I studied the documentation and the new mzscheme. The more precise  
> question I am asking myself is how to make Scheme code containing  
> accents work in v2xx and v3xx. Ok, it is still the same question, but  
> here are the details :)
> 
> For example, if I write in test.scm:
> (display "Je m'écrie:")(newline)
> encoded in Latin-1, 'mzscheme -r test.scm' v2xx produces an output in  
> Latin-1:
> Je m'écrie:
> More important, DrScheme will also produce a Latin-1 output.
> The output will be printed well only in terminals+shells using the  
> Latin-1 encoding, and will always be printed well in DrScheme.
> 
> 'mzscheme -r test.scm' v3xx produces:
> Je m'?crie:
> since it assumes the text was UTF-8, and 'é' is not a valid UTF-8  
> sequence. The output will never be printed well in any locale.
> If the forthcoming DrScheme has the same behavior, the text will also  
> not be printed well.
> 
> So, in the current situation, I have to encode all my source code in  
> Latin-1, and convert them to UTF-8 when v300 will be released.
> 
> 
> Also, if I convert my source file in UTF-8, the output is always UTF- 
> 8, so a system configured eg with LANG=fr_FR will see the output:
> Je m'écrie:
> ie UTF-8 interpreted as Latin-1 by the shell/terminal.
> DrScheme should not have this problem since it will surely use UTF-8  
> everywhere.
> 
> 
> I am looking for a solution where:
> - Latin-1 code could be read well in MzScheme/DrScheme v3xx, so that  
> my code works in both v2xx and v3xx,
> - output is locale dependant for MzScheme,
> ie like Jikes+Java, where the source file encoding is guessed,  
> strings are stored in UCS-2 in the .class, and output is converted to  
> the current locale's encoding.
> Beware that Javac has a wrong behavior, and assumes all source files  
> are encoded using the current locale, which may prevent people from  
> compiling other people's code.
> 
> 
> Matthew Flatt wrote:
>> At Sat, 22 May 2004 16:45:10 +0200, Sylvain Beucler wrote:
>> > I would like to test the Unicode support, so I precise my  
>> question.
>> > Is it tagged, or should I checkout the HEAD?
>> 
>> It's tagged as "v299".
>> 
>> Only MzScheme, MrEd, and a few assorted applications (e.g., SirMail,
>> Slideshow, Games) work so far --- not DrScheme or Help Desk. Also,  
>> you have to get docs from a temporary location:
>> 
>>   http://www.cs.utah.edu/~mflatt/tmp/mzscheme-doc.plt
>>   http://www.cs.utah.edu/~mflatt/tmp/mzlib-doc.plt
>>   http://www.cs.utah.edu/~mflatt/tmp/mred-doc.plt
>>   http://www.cs.utah.edu/~mflatt/tmp/insidemz-doc.plt


Posted on the users mailing list.