[plt-scheme] bytes vs u8vector
On Jan 29, Lauri Alanko wrote:
> On Sat, Jan 28, 2006 at 05:19:12PM -0500, Eli Barzilay wrote:
> > > To me it is not at all obvious why a byte string should have a zero
> > > at the end.
> >
> > Historical reason: strings in v20x turned to byte-strings in v300.
>
> Yes, but v20x strings were octet sequences, too, and you could have
> a zero inside them. Hence you couldn't reliably use C-style string
> operations on arbitrary Scheme strings even then.
Sure you could -- if there's a zero in them, then most C functions
would not use the whole string, but the main danger that the
terminating nul protects agains is referencing memory you should not
reference.
> > You're putting things upside down -- IIUC, you're saying that
> > mzscheme byte-strings should not be nul-terminated, and the
> > foreign interface should provide such a type in addition.
>
> Not exactly. I don't really care about the internal implementation
> of byte strings as such. But I want them to be identified with
> u8vectors, so if the FFI says that u8vectors don't need to be
> null-terminated, then byte strings shouldn't need to be, either.
Sure you care about them -- you care about not being able to use byte
strings as u8vectors.
> > Currently, byte-strings are nul-terminated and the foreign
> > interface adds a type for generic byte vectors.
>
> And _that_ is upside-down. Generic byte vectors are useful to the
> casual scheme programmer. NUL-terminated char arrays are relevant
> only when interfacing with C.
I don't think that there is anything I can add to this discussion. I
will shut up about this issue now.
> > See the paper that describes the foreign system -- when you write
> > code that uses this library, you must write `(unsafe!)' to get the
> > full power of the library. This is equivalent to a statement that
> > you know that the Scheme code you're writing is equivalent to C
> > code, and as such it is exposed to the usual low-level/C dangers.
>
> There are many kinds of "usual low-level dangers". Of course unsafe
> code can in principle break anything at all.
And Scheme code is safe from certain things like segfaults, unless it
uses foreign.
> But with the current foreign byte strings,
The term "byte strings" describes a data type that is part of
mzscheme, not the foreign interface. "Foreign byte strings" is
therefore bogus in nature.
> it is possible that a completely ordinary module that uses no unsafe
> features whatsoever operates on an ordinary-looking byte string and
> this will wreak havoc since the buffer had been freed and
> reallocated in the meanwhile. Tracking this kind of a problem
> becomes just as hard as it is in C.
It is not. When you write an interface to a foreign library, you
should protect your users against such problems -- if they get
writeable strings that can be freed, or any other kind of pointers
that can be invalidated, then it's your bug -- you should copy such
objects, or wrap them in new types with operations that are safe.
The only exception is if you write a library for glue code libraries,
and in that case you should define your own `unsafe!'-like form.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!