[plt-scheme] FFI and pointer manipulations

From: Eli Barzilay (eli at barzilay.org)
Date: Sat Feb 10 23:49:11 EST 2007

On Feb 11, Jens Axel Søgaard wrote:
> Eli Barzilay wrote:
> 
> >On Feb  9, Jens Axel Søgaard wrote:
> >  
> >It's possible to do arbitrary conversions using the foreign interface,
> >just like you do in C -- except that you have to put something in a
> >malloced space, and reference it as something else.  No "typecast"
> >function yet.  (Partly because foreign types are not completely first
> >class, which is yet to be hacked.)  But:
> >
> I'm not sure, I follow. Here is an example of what I want to do:
> 
> (let ([bs (bytes 0 1 2 3 4 5)])
>   (md5 <cpointer-to-bs+2> 3))
> 
> should calculate the md5 of the 3 bytes 2 3 4.
> 
> I can't figure out what to write instead of <cpointer-to-bs+2>.

Here's an example of what you want -- I have this function in x.so:

  void foo(char *a) { printf("received %d: \"%s\"\n", (int)a, a); }

and I do this interaction in MzScheme which should make it clear how
to play with pointers:

  > (define p (get-ffi-obj "foo" "~/tmp/x.so" (_fun _pointer -> _void)))
  > (define buf #"0123456789")
  > (p buf)
  received 180307632: "0123456789"
  > (define tmp (malloc (max (ctype-sizeof _int) (ctype-sizeof _pointer))))
  > (ptr-set! tmp _pointer buf)
  > (ptr-ref tmp _int)
  180307632
  > (ptr-set! tmp _int (+ 3 (ptr-ref tmp _int)))
  > (ptr-ref tmp _int)
  180307635
  > (ptr-ref tmp _bytes)
  #"3456789"
  > (p (ptr-ref tmp _pointer))
  received 180307635: "3456789"

The intentional part I talked about is the fact that it is difficult
to have an arbitrary pointer -- it must be done through the ptr-ref
loophole which can convert anything to anything else.


> >This is intentional.  Having a pointer to the middle of a malloced
> >object will not work right with 3m.  In fact, I think that it may
> >cause damage even if you use it in some harmless way.  One way to
> >deal with this would be to provide a kind of pointer-with-offset,
> >and another would be to add a new kind of "dangerous pointer" type.
> >Both will not be easy to understand and to use, and nothing is
> >implemented yet.
> >
> It is the pointer-with-offset I need. In my case the C code doesn't
> store the pointer, so it ought to be safe.

The fact that C doesn't store the pointer is not too relevant -- you
have to be very careful for everything that is done in Scheme code,
because as long as you're there a GC can happen -- and in 3m this
means that pointers are likely to change.  Here's an example that
demonstrates this:

  > (define buf #"0123456789")
  > (define tmp (malloc (max (ctype-sizeof _int) (ctype-sizeof _pointer))))
  > (ptr-set! tmp _pointer buf)
  > (ptr-ref tmp _int)
  26725428
  > (collect-garbage)
  > (ptr-ref tmp _int)
  26725428
  > (ptr-set! tmp _pointer buf)
  > (ptr-ref tmp _int)
  20266980

Note that the pointer was changed, but the saved value was not.  This
is because (malloc 4) or (malloc _int) are different from
(malloc _pointer) which plays with the 3M GC:

  > (define buf #"0123456789")
  > (define tmp1 (malloc _int))
  > (define tmp2 (malloc _pointer))
  > (ptr-set! tmp1 _pointer buf)
  > (ptr-set! tmp2 _pointer buf)
  > (ptr-ref tmp1 _int)
  16796724
  > (ptr-ref tmp2 _int)
  16796724
  > (collect-garbage)
  > (ptr-ref tmp1 _int)
  16796724
  > (ptr-ref tmp2 _int)
  22052836
  ;; to make sure that the new pointer is really 22052836:
  > (ptr-set! tmp1 _pointer buf)
  > (ptr-ref tmp1 _int)
  22052836

Now, the problem with arbitrary pointers is that they mess up the 3m
GC.  Note that in this sequence the modified pointer value does *not*
change even though it should.

  > (define buf #"0123456789")
  > (define tmp (malloc _pointer))
  > (ptr-set! tmp _pointer buf)
  > (ptr-ref tmp _int)
  16796804
  > (ptr-set! tmp _int (+ 3 (ptr-ref tmp _int)))
  > (ptr-ref tmp _int)
  16796807
  > (collect-garbage)
  > (ptr-ref tmp _int)
  16796807

Moreover, according to what Matthew told me (which might not be true
now), it can even make the GC crash because something it thinks is a
pointer to a block of memory is actually pointing to the middle of a
real object.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.