[racket] make-sized-byte-string and GC

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sun Jan 26 08:18:18 EST 2014

At Sat, 25 Jan 2014 14:37:30 -0500, Ryan Culpepper wrote:
> On 01/25/2014 01:28 PM, Roman Klochkov wrote:
> > Is making bytestring from pointer adds the pointer to GC?
> > 
> >
> >  > (define x (malloc 'raw 10))
> >  > x
> > #<cpointer>
> >  > (define b (make-sized-byte-string x 10))
> >  > (cpointer-gcable? b)
> > #t
> >  > (cpointer-gcable? x)
> > #f
> >  > (cast x _pointer _int32)
> > 173726656
> >  > (cast b _pointer _int32)
> > 173726656
> >
> > So b and x points to the same block of 10 bytes, but value of b is
> > GCable and value of x is not.
> > I assume, that when b will be changed, then the bytestring will be
> > collected and accessing x will give segfault. Am I right?
> 
> I think it's a bug that (cpointer-gcable? b) returns true, since the FFI 
> generally treats bytestrings as pointers to the memory that stores their 
> contents, and in this case that memory is not managed by the GC.
> 
> The bytestring object itself (which consists of a header, a pointer, and 
> a length, IIRC) is collectible, but then so is the cpointer object 
> (which consists of a header, a pointer, and some other stuff, like a tag 
> list).
> 
> So no, you should not expect a segfault. On the other hand, if you free 
> x and use b afterwards, then you should expect a segfault or some other 
> form of memory corruption.

It's slightly worse: if you free `x`, but `b` remains reachable (even
if you never look inside it), then there's a chance of segfault.

The `cpointer-gcable?` predicate does not check whether the referenced
memory is GCable. In fact, it's not a property of the referenced memory
at all, but a property of the *reference*. The `cpointer-gcable?`
predicate reports whether the address in a reference should be
considered by the GC in determining live objects, and whether the
reference should be updated if the GC-managed object at the address
moves.

A byte string always references memory (containing the bytes) as a
GCable address --- so the `(cpointer-gcable? b)` result as true above
is correct.

Normally, an address referenced as GCable really is GC-allocated, and
things work as expected. If the address falls outside of the pages that
the GC manages, then that's ok, too, because the GC can detect
"foreign" pointers at the page level. There's a problem only if the
address refers to a GC-managed page but does not actually point to the
start of an object.

In the above example, if `x` is freed, then it's possible that the page
containing the address is later taken over by the GC. At that point,
`b` references an address that is managed by the GC, but the address is
unlikely to point to the start of an allocated object. That's why a
crash is possible if `x` is freed and `b` is still reachable.

In short, you can treat a raw-malloc()ed pointer as a "GCable" address
(and using an address as a byte implies that treatment), but you must
ensure that the reference is not longer reachable by the time that you
free() the address.



Posted on the users mailing list.