[plt-scheme] Fast Byte Vector Traversal (200MB+ file)
At Tue, 10 Feb 2009 06:46:11 -0500, Robert Winslow wrote:
> [...] I am writing software
> that loops many times over a file's contents in memory. The file will
> not change over the lifetime of the program.
>
> I have considered a few solutions to get C-like speed in this project,
> while using as much PLT Scheme as possible:
> 1) Write a simple TCP/IP server to do the iterating over the array of
> unsigned chars. MzScheme will send and receive requests from this
> daemon program.
> 2) Do something with C inside of MzScheme, possibly using c-lambda.
> How fast can this be?
> 3) Write a small C library, and use FFI. Will contracts slow it down?
> 4) Use u8vectors/cvectors/malloc/sequence generators in a way that I
> haven't anticipated to get speedy results using only Scheme code.
Option 2 will be fastest if the operation to be written in C is fairly
small. If it's large enough that the overhead of the FFI call is small,
then I recommend option 3.
To get a sense of the costs, I tried the extreme example of a function
that takes a byte string and index and extracts an integer from the
byte string:
#lang scheme/base
(require scheme/foreign)
(unsafe!)
;; Scheme:
(define (f0 s i)
(bytes-ref s i))
;; option 2, a C-implemented extension, "f_ext.c" below:
(define f2 (load-extension "f_ext.dylib"))
;; option 3, FFI, "f.c" below:
(define l (ffi-lib "f.dylib"))
(define f3 (get-ffi-obj 'f l (_fun _pointer _int -> _int)))
(define (go f)
(time
(for ([i (in-range 1000000)])
(f #"apple" 2))))
(go f0)
(go f2)
(go f3)
The output on my machine:
cpu time: 50 real time: 49 gc time: 0
cpu time: 31 real time: 30 gc time: 0
cpu time: 426 real time: 427 gc time: 0
If I remove the type checks in "f_ext.c", the second one goes to
cpu time: 24 real time: 24 gc time: 0
I bet that option 3 is going to bet the best for you. Despite its
overhead relative to an extension, if you're putting enough work into
the C function to get much better performance relative to Scheme, then
the overhead of a foreign call is likely small. For pure speed, it's
difficult to beat option 2, though.
----------------------------------------
f_ext.c:
#include "escheme.h"
static Scheme_Object *f(int argc, Scheme_Object **argv)
{
if (SCHEME_BYTE_STRINGP(argv[0])
&& SCHEME_INTP(argv[1])) {
int v= SCHEME_BYTE_STR_VAL(argv[0])[SCHEME_INT_VAL(argv[1])];
return scheme_make_integer(v);
} else
return scheme_false;
}
Scheme_Object *scheme_initialize(Scheme_Env *env) {
return scheme_make_prim_w_arity(f, "f", 2, 2);
}
Scheme_Object *scheme_reload(Scheme_Env *env) {
return scheme_initialize(env); /* Nothing special for reload */
}
Scheme_Object *scheme_module_name() {
return scheme_false;
}
----------------------------------------
f.c:
int f(char *s, int v) { return s[v]; }