[plt-scheme] Unicode on the cheap

From: Paul Schlie (schlie at comcast.net)
Date: Sun Jan 25 11:19:45 EST 2004

[tweak of earlier message]

Possibly (string ...) could/should be extended to accept arbitrary
scheme expressions, and produce strings by concatenating their string
equivalents together; i.e.

(define s (string #\a "bc" (uc 3332) (uc 12431) (uc 3423) 1 "2" (+ 1 2)))

s -> "abc???123" ; assuming ? represents a non-displayable character.

Where (uc N) returns a string of n UTF characters representing the Unicode
code point N, which can then in turn be further concatenated together by
(string ...); indirectly obsolescing (string-append ...) as being redundant.

Whereby then strings of Chinese characters may be formed by concatenating
strings composed themselves of UTF-8 characters representing each individual
character.

Where then possibly:

(string-length s) -> 9  ; length in logical characters (Unicode code-points)
(string-UTF-8 s) -> 15  ; length in physical UTF-8 code-units (bytes)
                        ; or maybe 16 if including a terminal null marker

-paul-

At Sun, 25 Jan 2004, Matthew Flatt wrote:
> At Sun, 25 Jan 2004 08:35:29 -0600, Robby Findler wrote:
>>   If I were to put four of those chinese characters into string (eg by
>>   calling `string' with four arguments), why wouldn't the resulting
>>   string have a `string-length' of four?
>
> If you create a string by calling `string' with four arguments, then
> `string-length' reports 4. Each of the four arguments to `string' is a
> "char" (and therefore a code unit or #\377 or #\376).
>
> But you can't create a string containing four Chinese characters by
> calling `string' with four arguments, because each Chinese character
> requires three "char"s.
> 
> Matthew

------ End of Forwarded Message



Posted on the users mailing list.