[plt-scheme] Unicode on the cheap
[tweak of earlier message]
Possibly (string ...) could/should be extended to accept arbitrary
scheme expressions, and produce strings by concatenating their string
equivalents together; i.e.
(define s (string #\a "bc" (uc 3332) (uc 12431) (uc 3423) 1 "2" (+ 1 2)))
s -> "abc???123" ; assuming ? represents a non-displayable character.
Where (uc N) returns a string of n UTF characters representing the Unicode
code point N, which can then in turn be further concatenated together by
(string ...); indirectly obsolescing (string-append ...) as being redundant.
Whereby then strings of Chinese characters may be formed by concatenating
strings composed themselves of UTF-8 characters representing each individual
character.
Where then possibly:
(string-length s) -> 9 ; length in logical characters (Unicode code-points)
(string-UTF-8 s) -> 15 ; length in physical UTF-8 code-units (bytes)
; or maybe 16 if including a terminal null marker
-paul-
At Sun, 25 Jan 2004, Matthew Flatt wrote:
> At Sun, 25 Jan 2004 08:35:29 -0600, Robby Findler wrote:
>> If I were to put four of those chinese characters into string (eg by
>> calling `string' with four arguments), why wouldn't the resulting
>> string have a `string-length' of four?
>
> If you create a string by calling `string' with four arguments, then
> `string-length' reports 4. Each of the four arguments to `string' is a
> "char" (and therefore a code unit or #\377 or #\376).
>
> But you can't create a string containing four Chinese characters by
> calling `string' with four arguments, because each Chinese character
> requires three "char"s.
>
> Matthew
------ End of Forwarded Message