[plt-scheme] Unicode on the cheap

From: Paul Schlie (schlie at comcast.net)
Date: Sun Jan 25 10:32:11 EST 2004

Or possibly (string ...) could/should be extended to simply accept arbitrary
scheme expressions, and produce strings by concatenating their string
equivalents; i.e.

(define s (string 'a "bc" (make-N-chinese-chars 3) 1 "2" (+ 1 2)))

s -> "abc???123"

Whereby producing strings of more complex characters may be thought of as
concatenating complex characters each themselves composed of UTF-8 strings.

And where:

(string-length s) -> 9  ; length in logical characters (Unicode code-points)
(string-UTF-8 s) -> 15  ; length in physical UTF-8 code-units (bytes)
                        ; or maybe 16 if including the terminal null marker.

-paul-

At Sun, 25 Jan 2004, Matthew Flatt wrote:
> At Sun, 25 Jan 2004 08:35:29 -0600, Robby Findler wrote:
>>   If I were to put four of those chinese characters into string (eg by
>>   calling `string' with four arguments), why wouldn't the resulting
>>   string have a `string-length' of four?
>
> If you create a string by calling `string' with four arguments, then
> `string-length' reports 4. Each of the four arguments to `string' is a
> "char" (and therefore a code unit or #\377 or #\376).
>
> But you can't create a string containing four Chinese characters by
> calling `string' with four arguments, because each Chinese character
> requires three "char"s.
> 
> Matthew



Posted on the users mailing list.