[plt-scheme] FFI and pointer manipulations
At Sun, 11 Feb 2007 07:45:25 -0800, "Jim Blandy" wrote:
> Many systems with copying GC's represent "large" objects (big strings;
> long vectors) as small header objects that hold a pointer to the large
> body of the object. The header is allocated in the ordinary heap, and
> does get copied by collections; the body is allocated elsewhere, and
> stays put.
Yes, the 3m GC does this, and the 3m GC API includes support for large
objects that don't move and for which interior pointers are allowed.
The GC also support "immobile boxes", which hold a single pointer and
must be explicitly released.
> This is usually implemented as a speed optimization for the GC, but
> once you've gone to the trouble of handling that representation, it
> almost certainly doesn't matter whether the object is actually large
> or not;
In our current GC implementation, it does matter, since a non-moving
object has to have its own page. (This is more related to allowing
interior pointers, I suppose, but it also means that we don't have to
worry about fragmentation below page granularity.)
> the programmer could explicitly request that any string or
> vector be represented this way, making an object's "mobility" a
> programmer-visible aspect of its type.
>
> I'll agree immediately that this is not beautiful; you now have a
> special subtype of strings and vectors, with a new class of errors to
> go along with it. But once you have chosen to interact with C, it
> inevitably follows that whether an object is mobile or not is a
> necessary property to attend to. It's as important to writing correct
> code as its length or its address.
I agree with the last sentence; C programmers have to know that all
Scheme objects move. I don't think it necessarily follows that the GC
must support non-moving objects. I especially don't think it follows
that the Scheme language must support a non-moving class of values.
> If the programmer can't designate particular objects as immobile, then
> it seems to me she must depend on some even-harder-to-use property,
> like "GC never runs while in C code, unless you (or any of your
> callees) ever calls any of these functions: ...".
In my experience, you end up relying on hard-to-use properties when
connecting C and Scheme with any GC, including the conservative
collector.
Meanwhile, let's go back to Jen Axel's original problem: not that the
object can move, but that pointers into the middle of an object are
disallowed. This problem is not solved by simply having non-moving
objects. Indeed, the CGC variant of MzScheme, objects don't move, but
still interior pointers are disallowed.
As it turns out, Eli's example works on with the conservative
collector, because the top-level definition of `buf' holds onto the
byte array while it's being used from places that the GC can't see.
That's sometimes an easy-to-use property, but it's sometimes a
hard-to-use property.
It's not an immediate property for Jens Axel's example, I think,
because a call to a foreign function wouldn't naturally hold on to an
argument byte string supplied by the client. That is,
(define (scheme-md5 buf offset)
(define tmp (malloc (max (ctype-sizeof _int)
(ctype-sizeof _pointer))))
(ptr-set! tmp _pointer buf)
(ptr-set! tmp _int (+ offset (ptr-ref tmp _int)))
(let ([buf-with-offset (ptr-ref tmp _bytes)])
(free tmp)
(C-md5 buf-with-offset)))
would be broken for both 3m and CGC.
I'm sure that the 3m GC API could be improved, and I'm sure that
someone else has already solved this problem in another
language/implementation. Unfortunately, it's not as simple as adding
more support for immobile objects (and it's definitely not as simple
for us as piggy-backing on some existing support for large objects plus
adding a parallel set of types to Scheme :).
Matthew