[plt-scheme] Effect of caching using hash-tables vs vectors

From: Daniel Yoo (dyoo at cs.wpi.edu)
Date: Sat Mar 17 16:17:43 EDT 2007

Hi everyone,


Just for fun, I did a solution for the 3n+1 problem from Programming 
Challenges:

     http://acm.uva.es/p/v1/100.html

I'll try to avoid spoiling other people from trying this out, and will 
just talk about an interesting performance issue I hit.


One of the things I used to make the solution fast is to memoize results, 
to avoid the cost of recomputation.  I initially tried using dherman's 
memoize module (for its DEFINE/MEMO*), but found that it did a little too 
much caching.

I needed finer control over the domain of inputs that I wanted to memoize, 
so I defined the following macro:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
   (define *MAXIMUM-CACHED* 1000000)
   (define-syntax (define/memo* stx)
     (syntax-case stx ()
       [(_ (name var) body)
        (syntax/loc stx
          (define name
            (let* ([table (make-hash-table 'equal)]
                   [in-domain? (lambda (n) (< n *MAXIMUM-CACHED*))]
                   [lookup (lambda (n)
                             (hash-table-get table n #f))]
                   [cached? (lambda (n) (and (lookup n) #t))]
                   [store! (lambda (n val)
                             (when (in-domain? n)
                               (hash-table-put! table n val)))])
              (lambda (var)
                (if (cached? var)
                    (lookup var)
                    (let ([result body])
                      (store! var result)
                      result))))))]))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

With this, computing the maximum Collatz chain length between 1 and 
1000000 took about 20 seconds on my machine.  Not too bad.  (If I didn't 
control the domain of cached inputs, the runtime ballooned so badly that I 
didn't have the patience to wait it out.)


But actually this wasn't my original macro: I had used a vector instead as 
my first shot at the problem:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
   (define *MAXIMUM-CACHED* 1000000)
   (define-syntax (define/memo* stx)
     (syntax-case stx ()
       [(_ (name var) body)
        (syntax/loc stx
          (define name
            (let* ([table (make-vector *MAXIMUM-CACHED* 0)]
                   [in-domain? (lambda (n) (< n *MAXIMUM-CACHED*))]
                   [lookup (lambda (n)
                             (if (in-domain? n)
                                 (vector-ref table n)
                                 0))]
                   [cached? (lambda (n) (> (lookup n) 0))]
                   [store! (lambda (n val)
                             (when (in-domain? n)
                               (vector-set! table n val)))])
              (lambda (var)
                (if (cached? var)
                    (lookup var)
                    (let ([result body])
                      (store! var result)
                      result))))))]))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Using vectors as my lookup structure: I timed about 3 seconds here!  It 
was just out of curiosity that I redid the macro to use hash-tables, and 
that's where the performance results surprised me: the vector lookup went 
much faster than the hash-table approach.

I was wondering if this is consistent with what other people would expect 
as far as hash-table performance is concerned: I did expect an overhead in 
comparing hash-tables to vectors as lookup structures, but not such a 
large one.


Happy St. Patrick's Day!


Posted on the users mailing list.