[plt-scheme] FW: Unicode on the cheap
Or visa-versa (as originally planned, to preserve present code semantics
when using strings to manipulate arbitrary raw byte data;
which I now understand includes 0, as mzscheme strings
aren't null terminated, unlike c-strings), yielding:
(string-length s) -> 15 ; length in raw bytes (UTF-8 character code-units).
(unicode-length s) -> 9 ; length in unicode code-points (logical characters)
(what-ever-length s) -> ? ; length in what-ever encoding form of data bytes
Resulting in:
- strings more properly being considered to be: "raw-byte-strings"
- characters being more properly considered to be: "raw-bytes"
(which isn't bad, as it indirectly provides scheme a more formal raw-data
storage type than it arguably may have been considered to previously have).
-paul-
> From: Paul Schlie <schlie at comcast.net>
>
> Where then possibly:
>
> (string-length s) -> 9
> (string-UTF-8 s) -> 15 ; length in physical UTF-8 code-units (bytes)
> ; or maybe 16 if including a terminal null marker
------ End of Forwarded Message