[plt-scheme] mzchar and wchar_t

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Mar 15 08:13:31 EST 2006

At Tue, 14 Mar 2006 15:55:54 -0800, "Jim Blandy" wrote:
> On 3/14/06, Matthew Flatt <mflatt at cs.utah.edu> wrote:
> > At Tue, 14 Mar 2006 12:36:05 +0100, Jean-Guillaume wrote:
> > > Is there a simple relation between mzchar and wchar_t (and wint_t)
> > > types ?
> >
> > If you have a wchar_t whose value that is in [0, #xD7FF] or [#xE000,
> > #xFFFF], then you can use it as a mzchar and vice-versa.
> >
> > More generally, a mzchar is a 4-byte value that is a Unicode scalar
> > value, and a wchar_t is a 2-byte value that is potentially a surrogate.
> > You can convert between strings of each type of character by UTF-16
> > decoding/decoding, perhaps using scheme_utf8_encode() and
> > scheme_utf8_decode() (which, despite the names, support a UTF-16 mode).
> > Beware of decoding a wchar_t string that may have unpaired surrogates.
> 
> So MzScheme assumes that wchar_t is a UTF-16 code unit?

Oops - no. (I got confused, thinking that wchar_t is Windows-specific.)

To get from wchar_t to mzchar, the only strategy I know is to use
wcstombs to get to a locale-specific encoding, and then use
bytes->string/locale. The reverse mapping (mchar to wchar_t) is
string->bytes/locale followed by mbstowcs.

Matthew



Posted on the users mailing list.