[racket-dev] FFI: pointer to an array in a C struct type

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Mon Dec 3 14:10:19 EST 2012

At Mon, 03 Dec 2012 11:05:10 -0700, Neil Toronto wrote:
> On 12/03/2012 07:31 AM, Matthew Flatt wrote:
> > At Mon, 3 Dec 2012 12:31:37 +0100, Tobias Hammer wrote:
> >> On Mon, 03 Dec 2012 11:45:08 +0100, Neil Toronto <neil.toronto at gmail.com>
> >> wrote:
> >>
> >>> This error seems wrong:
> >>>
> >>>
> >>> #lang racket
> >>>
> >>> (require ffi/unsafe
> >>>            ffi/unsafe/cvector)
> >>>
> >>> (define-cstruct _mpz ([alloc _int]
> >>>                         [size _int]
> >>>                         [limbs (_gcable _cvector)]))
> >>>
> >>> (define z (make-mpz 1 1 (list->cvector '(1) _long)))
> >>> (mpz-limbs z)
> >>>
> >>>   >>> _cvector: cannot automatically convert a C pointer to a cvector
> >>>
> >>>
> >>> The error might be correct, though late and confusing, if a cvector's
> >>> "base representation" isn't a pointer type. Is it?
> >>
> >> The base representation should be a pointer but the length and type
> >> information to convert it back to a cvector is missing. Whats stored
> >> inside the struct is only a plain pointer, without the information you
> >> supplied to make-mpz, and because of this _cvector only has a
> >> racket->c-conversion (see ffi/unsafe/cvector.rkt).
> >>
> >>> If that error is correct, then how are FFI users meant to define a C
> >>> struct that has a "long *" field that points to an array? I've not yet
> >>> managed to define one whose instances survive a garbage collection cycle
> >>> without using (_gcable _cvector). Here's one of my desperate attempts:
> >>>
> >>>
> >>> #lang racket
> >>>
> >>> (require ffi/unsafe
> >>>            ffi/unsafe/cvector)
> >>>
> >>> (define-cstruct _mpz ([alloc _int] [size _int] [limbs _gcpointer]))
> >>>
> >>> (define z (make-mpz 1 1 (cvector-ptr (list->cvector '(1) _long))))
> >>>
> >>>   > (ptr-ref (mpz-limbs z) _long 0)
> >>> 1
> >>>   > (collect-garbage)
> >>>   > (ptr-ref (mpz-limbs z) _long 0)
> >>> 139856348920568
> >>>
> >>>
> >>> I mean to be complain-y this time. This shouldn't be this hard to figure
> >>> out.
> >>
> >> I guess the problem here is, that the gc only knows memory with only
> >> pointers (to gcable memory) inside and (atomic) memory without any
> >> pointers it should follow. You try to mix them up. make-mpz should malloc
> >> (see ffi docs) atomic memory and you put a pointer managed by the gc into
> >> it. The pointer gets relocated on gc, but the ref inside the struct is not
> >> known and not updated.
> >>
> >> I unfortunately don't know a simple, gc- and typesafe way to put pointers
> >> inside structs. My attempts to get around this were pretty hacking (using
> >> a raw malloc'd pointer, embed a struct with plain values as pointer inside
> >> an interor-malloc'd struct together with all others pointers, etc).
> >> I would be really interested in a way to accomplish this, too.
> >
> > Yes. I've thought about this, but for the libraries I've written, the
> > cases where the GC can really solve the problem automatically seem
> > relatively rare, so I haven't been sufficiently motivated for my own
> > uses.
> >
> > In the `_mpz' example above, the presence of both `alloc' and `size'
> > suggests that a callee might be responsible for malloc()ing `limbs' if
> > there's not already enough room. If so, `_gcpointer' isn't the right
> > type. Maybe this example doesn't actually go that way, but it's a
> > typical problem.
> >
> > Neil, can you say more about how `_mpz' instances are used with foreign
> > functions?
> 
> They represent GMP's bignums, and they're used as both input and output 
> arguments.
> 
> When used as input arguments, GMP's functions never mutate them. It 
> should be safe to pass an `mpz' that contains a pointer to memory that 
> Racket manages, as long as it doesn't get relocated during the function 
> call. In this case, I need to be able to set the `limbs' array's 
> elements in Racket code.
> 
> When used as output arguments, GMP expects to be able to free the array 
> pointed at by the `limbs' field, allocate a new array for the result, 
> and set the `limbs' field to point to it. I use the GMP function 
> "mpz_init" to initialize the fields of an `mpz' instance before using it 
> as an output argument. In this case, I need to be able to read the 
> `limbs' array's elements in Racket code.

Overall, it sounds to me like you should `malloc' a `limbs' array in
'raw mode, use `free' to free it, use `_pointer' as the ctype for the
`limbs' field, and so on. If you need something like finalization, use
`ffi/unsafe/alloc' to make sure that a freeing function for a `_mpz' is
paired with an `_mpz' allocation.

Just to me clear, even if it somehow worked to use `_gcpointer' as the
ctype of a field in an `_mpz' that you allocate, that would be the
wrong ctype for an `_mpz' that was filled in by `mpz_init' or updated
as a result.


Posted on the dev mailing list.