[plt-scheme] Fast Byte Vector Traversal (200MB+ file)

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Tue Feb 10 07:11:24 EST 2009

At Tue, 10 Feb 2009 06:46:11 -0500, Robert Winslow wrote:
> [...] I am writing software
> that loops many times over a file's contents in memory. The file will
> not change over the lifetime of the program.
> 
> I have considered a few solutions to get C-like speed in this project,
> while using as much PLT Scheme as possible:
> 1) Write a simple TCP/IP server to do the iterating over the array of
> unsigned chars. MzScheme will send and receive requests from this
> daemon program.
> 2) Do something with C inside of MzScheme, possibly using c-lambda.
> How fast can this be?
> 3) Write a small C library, and use FFI. Will contracts slow it down?
> 4) Use u8vectors/cvectors/malloc/sequence generators in a way that I
> haven't anticipated to get speedy results using only Scheme code.

Option 2 will be fastest if the operation to be written in C is fairly
small. If it's large enough that the overhead of the FFI call is small,
then I recommend option 3.

To get a sense of the costs, I tried the extreme example of a function
that takes a byte string and index and extracts an integer from the
byte string:

 #lang scheme/base
 (require scheme/foreign)
 (unsafe!)

 ;; Scheme:
 (define (f0 s i)
   (bytes-ref s i))

 ;; option 2, a C-implemented extension, "f_ext.c" below:
 (define f2 (load-extension "f_ext.dylib"))

 ;; option 3, FFI, "f.c" below:
 (define l (ffi-lib "f.dylib"))
 (define f3 (get-ffi-obj 'f l (_fun _pointer _int -> _int)))

 (define (go f)
   (time
    (for ([i (in-range 1000000)])
      (f #"apple" 2))))

 (go f0)
 (go f2)
 (go f3)

The output on my machine:

 cpu time: 50 real time: 49 gc time: 0
 cpu time: 31 real time: 30 gc time: 0
 cpu time: 426 real time: 427 gc time: 0

If I remove the type checks in "f_ext.c", the second one goes to

 cpu time: 24 real time: 24 gc time: 0

I bet that option 3 is going to bet the best for you. Despite its
overhead relative to an extension, if you're putting enough work into
the C function to get much better performance relative to Scheme, then
the overhead of a foreign call is likely small. For pure speed, it's
difficult to beat option 2, though.

----------------------------------------
f_ext.c:

 #include "escheme.h"

 static Scheme_Object *f(int argc, Scheme_Object **argv)
 {
   if (SCHEME_BYTE_STRINGP(argv[0])
       && SCHEME_INTP(argv[1])) {
     int v= SCHEME_BYTE_STR_VAL(argv[0])[SCHEME_INT_VAL(argv[1])];
     return scheme_make_integer(v);
   } else
     return scheme_false;
 }

 Scheme_Object *scheme_initialize(Scheme_Env *env) {
   return scheme_make_prim_w_arity(f, "f", 2, 2);
 }
 Scheme_Object *scheme_reload(Scheme_Env *env) {
   return scheme_initialize(env); /* Nothing special for reload */
 }
 Scheme_Object *scheme_module_name() {
   return scheme_false;
 }

----------------------------------------
f.c:

 int f(char *s, int v) { return s[v]; }



Posted on the users mailing list.