[plt-scheme] bytes vs u8vector

From: Eli Barzilay (eli at barzilay.org)
Date: Sat Jan 28 13:32:06 EST 2006

On Jan 28, Lauri Alanko wrote:
> On Sat, Jan 28, 2006 at 11:47:17AM -0500, Eli Barzilay wrote:
> > Byte strings are neither more nor less primitive than u8vectors.  They
> > are different facilities.
> 
> I think we may be talking about slightly different things. When I
> say "u8vector", the intension is "the data type that srfis 4 and 66
> define".  It is only an accidental fact that the implementation of
> srfi-4 in mzscheme is based on the FFI.

OK.


> To you, on the other hand, I surmise that "u8vector" means primarily
> the data type used in the FFI to communicate binary data to and from
> C code and it's only an accidental fact that it's also used to
> implement an SRFI.
> 
> So, in a sense you are wrong: the functionality that the
> srfi-u8vector provides is exactly the functionality that byte
> strings provide.

Except for the extra zero that must be there.  One of the reasons for
srfi-4's existence is to be able to talk to foreign code, this
includes getting such vector values from foreign code, which means
that you cannot put an extra zero (or copy the vector).


> And in a sense you are right: the u8vector type in the FFI serves a
> different purpose from the byte strings.

Right -- the u8vector is pretty much exactly what srfi-4 talks about.
It just happens that byte strings stand for a C char* which can be
viewed as a byte vector, but it is more common to use them as strings.


> > The problem is not when a byte string is allocated, it's when a
> > foreign function returns a "u8vector" -- it cannot be made into a
> > byte string, so there should be two types at the low-level
> > implementation.
> 
> Do you mean that the FFI guarantees that a byte string's contents
> can be accessed directly and it will be null-terminated? And there's
> already existing code that relies on this? That would indeed be a
> problem.

That's not part of the foreign interface, it is an assumption that is
built into mzscheme.  (Unfortunate or not, the reason it is there is
obvious.)  So, if you're only concerned about allocating u8vectors, it
would be fine to grab one more byte for a terminating zero.  OTOH, you
run into problems if you have a foreign function that returns one of
these things.


> Hm, I've never really looked at the FFI before... do I gather
> correctly that it's possible to create byte strings whose data
> actually resides at some pointer that's been returned from the C
> world?

Yes -- but that's part of mzschem, see the inside mzscheme manual.


> I don't see how memory management can work with such an arrangement.

The GC just ignores all pointers to memory that it does not manage.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.