[plt-scheme] 299.7

From: Bradd W. Szonye (bradd+plt at szonye.com)
Date: Wed May 12 13:41:20 EDT 2004

Alex Ott wrote:
>> What do you think about C++ aproach to encoding and locale
>> conversion for ports and strings?

Matthew Flatt wrote:
> If I understand C++'s approach to ports/streams correctly, the
> encoding between strings and bytes is a (mutable) property of the
> stream object.

FYI, here's some information about C/C++ streams and encodings.

Streams are initially encoding-neutral. The first actual I/O operation
assigns an orientation to the stream, either "byte-oriented" or "wide-
oriented." A byte-oriented stream reads and writes raw bytes; it
generally performs no encoding conversions (except for minor details
like handling newline conventions in "text" mode). A wide-oriented
stream converts between the stream's encoding and the internal
wide-character (wchar_t) encoding. The first I/O operation assigns the
encoding rule to the stream.

A stream's orientation and (if applicable) encoding are /not/ mutable
once set. Only a call to freopen -- which flushes and resets the stream
-- can change them. Yes, they start out uninitialized, but they are not
generally mutable.

There are only two basic ways to handle encoding in C. If the wchar_t
internal encoding (usually UTF-16 or UTF-32) is sufficient for your
needs, you can use setlocale and fwide() to set the stream encoding,
then use the wide-oriented functions to convert automatically between
the stream encoding and the internal encoding. Otherwise, you must use
byte-oriented I/O and convert everything manually.

I'm a bit rusty on C++ iostreams, but IIRC they're a lot more flexible
and easier to use for encoding conversion, mainly because they use C++'s
more powerful locale facilities.
-- 
Bradd W. Szonye
http://www.szonye.com/bradd


Posted on the users mailing list.