[plt-scheme] strings?
On Apr 16, Mike wrote:
> On Fri, 15 Apr 2005, Matthew Flatt might have said:
>
> > SCHEME_BYTE_STR_VAL() should be used only on a value for which
> > SCHEME_BYTE_STRINGP() produces true.
> >
> > SCHEME_CHAR_STR_VAL() should be used only on a value for which
> > SCHEME_CHAR_STRINGP() produces true.
> >
> > The result of SCHEME_CHAR_STR_VAL() is a mzchar*, so I can see why
> > you're trying to use SCHEME_BYTE_STR_VAL() to get a char*, but it
> > doesn't work. (The actual layout of byte string and char string values
> > means that, on a little-endian machine, you end up with one byte when
> > trying to use a char string as a byte string. But that's just an
> > artifact of the current data layout.)
> >
> > To turn a char string into a byte string, you can use
> > scheme_char_string_to_byte_string() or
> > scheme_char_string_to_byte_string_locale(), depending on whether you
> > want a UTF-8 or locale-based encoding of the string.
>
> Thanks for the explanation. I'm still lost.
> I accept that mzscheme uses a multibyte representation internally,
> and that the use of scheme strings works internally. I want to
> extract a string from inside mzscheme and give that string
> as a null-terminated string to a C function. That function
> could be (and is) sqlite_open() or ot could be fopen() for
> opening a file for custom processing, etc.
>
> How do I extract the string from mzscheme to give to the C function?
The internal representation is UCS-4: it's a Unicode encoding that
uses 4 bytes for each character. Most C library functions will expect
a simple NUL-terminated string, either plain ASCII or using an
encoding like UTF-8 or something based on your locale. You need to
somehow convert Scheme strings into a sequence of NUL-terminated
bytes. There are three options for that:
1. MzScheme has a `bytes' (or byte strings) type which is similar to
plain C strings. This is used in places where you want a simple
sequence of NUL-terminated characters -- it corresponds to a C
`char*'. The syntax for these things on the Scheme side is #"blah
blah". If you use these things from Scheme, then on the C side you
should use SCHEME_BYTE_STRINGP to test for these values and
SCHEME_BYTE_STR_VAL to extract the contained (NUL-terminated)
char*.
2. If you don't want to deal with byte-strings on the Scheme side, you
can provide a Scheme interface that will do the conversion. For
example, you implement a `foo-bytes' function in C, then you write
a Scheme wrapper function that looks like:
(define (foo str)
(foo-bytes (string->bytes/utf-8 str)))
This wrapper will convert the string to a UTF-8 encoded bytes.
There is also `string->bytes/locale' and `string->bytes/latin-1'
for other encodings, if you only deal with ASCII they will all do
the same.
3. If you want to do this conversion in C so you never deal with byte
strings in Scheme, then you should use the C functions that Matthew
pointed at: `scheme_char_string_to_byte_string' or
`scheme_char_string_to_byte_string_locale' which will convert a
Scheme string to a NUL-terminated char* -- they correspond to the
`bytes->string/utf-8' and `bytes->string/locale'.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!