[plt-scheme] FW: Unicode on the cheap

From: Paul Schlie (schlie at comcast.net)
Date: Sun Jan 25 13:59:06 EST 2004

Or visa-versa (as originally planned, to preserve present code semantics
               when using strings to manipulate arbitrary raw byte data;
               which I now understand includes 0, as mzscheme strings
               aren't null terminated, unlike c-strings), yielding:

(string-length s) -> 15 ; length in raw bytes (UTF-8 character code-units).

(unicode-length s) -> 9 ; length in unicode code-points (logical characters)

(what-ever-length s) -> ? ; length in what-ever encoding form of data bytes

Resulting in: 
               
- strings more properly being considered to be: "raw-byte-strings"
- characters being more properly considered to be: "raw-bytes"

(which isn't bad, as it indirectly provides scheme a more formal raw-data
storage type than it arguably may have been considered to previously have).

-paul-

> From: Paul Schlie <schlie at comcast.net>
> 
> Where then possibly:
> 
> (string-length s) -> 9
> (string-UTF-8 s) -> 15  ; length in physical UTF-8 code-units (bytes)
>                         ; or maybe 16 if including a terminal null marker

------ End of Forwarded Message



Posted on the users mailing list.