[plt-scheme] Lexical char-downcase for extended character sets

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Oct 9 10:56:53 EDT 2002

At Tue, 8 Oct 2002 17:15:40 +0200, Erich Rast wrote:
> I need to index text documents containing character values in MacRoman 
>  > 128. Now I've encountered the following problems:
> 
> 1.) For case-insensitive  index keys, I intended to use char-downcase, 
> but this doesn't work like expected for german Umlaute: (char-downcase 
> #\Ä)==>#\304, which is the same as #\Ä, but I'd need #\344 also known 
> as #\ä (the small letter 'a' with two dots on top).

It's possible that you can set `current-locale' so that
`char-locale-downcase' converts using MacRoman. But a quick check
suggests that no such locale mapping exists.

In case it's useful, there's a MacRoman -> Latin-1 table in
    plt/src/mzscheme/src/mac_roman.inc

> 2.) Is there a simple way to convert a high character range to some 
> reasonable lexicographic mapping into ASCII. Examples: Ä=>A, ü=>u. Or 
> do I need to build substitution tables?

I can't think of any existing function that would perform that
conversion.

Matthew



Posted on the users mailing list.