[plt-scheme] strings?

From: Eli Barzilay (eli at barzilay.org)
Date: Sat Apr 16 13:24:16 EDT 2005

On Apr 16, Mike wrote:
> On Fri, 15 Apr 2005, Matthew Flatt might have said:
> > SCHEME_BYTE_STR_VAL() should be used only on a value for which
> > SCHEME_BYTE_STRINGP() produces true.
> > 
> > SCHEME_CHAR_STR_VAL() should be used only on a value for which
> > SCHEME_CHAR_STRINGP() produces true.
> > 
> > The result of SCHEME_CHAR_STR_VAL() is a mzchar*, so I can see why
> > you're trying to use SCHEME_BYTE_STR_VAL() to get a char*, but it
> > doesn't work. (The actual layout of byte string and char string values
> > means that, on a little-endian machine, you end up with one byte when
> > trying to use a char string as a byte string. But that's just an
> > artifact of the current data layout.)
> > 
> > To turn a char string into a byte string, you can use
> > scheme_char_string_to_byte_string() or
> > scheme_char_string_to_byte_string_locale(), depending on whether you
> > want a UTF-8 or locale-based encoding of the string.
> Thanks for the explanation. I'm still lost.
> I accept that mzscheme uses a multibyte representation internally,
> and that the use of scheme strings works internally. I want to
> extract a string from inside mzscheme and give that string
> as a null-terminated string to a C function. That function
> could be (and is) sqlite_open() or ot could be fopen() for
> opening a file for custom processing, etc.
> How do I extract the string from mzscheme to give to the C function?

The internal representation is UCS-4: it's a Unicode encoding that
uses 4 bytes for each character.  Most C library functions will expect
a simple NUL-terminated string, either plain ASCII or using an
encoding like UTF-8 or something based on your locale.  You need to
somehow convert Scheme strings into a sequence of NUL-terminated
bytes.  There are three options for that:

1. MzScheme has a `bytes' (or byte strings) type which is similar to
   plain C strings.  This is used in places where you want a simple
   sequence of NUL-terminated characters -- it corresponds to a C
   `char*'.  The syntax for these things on the Scheme side is #"blah
   blah".  If you use these things from Scheme, then on the C side you
   should use SCHEME_BYTE_STRINGP to test for these values and
   SCHEME_BYTE_STR_VAL to extract the contained (NUL-terminated)

2. If you don't want to deal with byte-strings on the Scheme side, you
   can provide a Scheme interface that will do the conversion.  For
   example, you implement a `foo-bytes' function in C, then you write
   a Scheme wrapper function that looks like:

     (define (foo str)
       (foo-bytes (string->bytes/utf-8 str)))

   This wrapper will convert the string to a UTF-8 encoded bytes.
   There is also `string->bytes/locale' and `string->bytes/latin-1'
   for other encodings, if you only deal with ASCII they will all do
   the same.

3. If you want to do this conversion in C so you never deal with byte
   strings in Scheme, then you should use the C functions that Matthew
   pointed at: `scheme_char_string_to_byte_string' or
   `scheme_char_string_to_byte_string_locale' which will convert a
   Scheme string to a NUL-terminated char* -- they correspond to the
   `bytes->string/utf-8' and `bytes->string/locale'.

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!

Posted on the users mailing list.