[racket-dev] Racket hash tables vs. Python dicts - Performance

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu Jul 24 06:09:11 EDT 2014

In Racket, it looks like a lot of the time is still in generic
dispatch: directing hash operations to `equal?`-based hashing,
directing `equal?`-based hashing operation to the string-specific
operation, directing equality comparison to string-specific equality
comparison, and so on.

I'm not sure if that's the full story, but my guess is that such paths
are more streamlined in Python, given the central role that
dictionaries play there.

At Wed, 23 Jul 2014 16:54:47 +0100, Pedro Ramos wrote:
> Hi,
> 
> I've been developing an implementation of Python in Racket, where I'm
> implementing Python's dictionaries over Racket custom hash tables.
> 
> While my occasional benchmarks typically show better performance on Racket
> programs than their Python equivalents, Racket's hash tables generally seem
> to be slower than Python's dicts.
> 
> I've set up this benchmark in Racket:
> 
> 
> #lang racket
> 
> (define alphabet "abcdefghijklmnopqrstuvwxyz")
> (define (random-word n)
>   (build-string n (lambda (x) (string-ref alphabet (random 23)))))
> 
> (define words (for/list ([k 1000000])
>                 (random-word 3)))
> (define d (make-hash))
> 
> (time (for ([w words])
>         (if (hash-has-key? d w)
>             (hash-set! d w (add1 (hash-ref d w)))
>             (hash-set! d w 1))))
> 
> 
> And its equivalent in Python:
> 
> 
> import random
> import time
> 
> alphabet = "abcdefghijklmnopqrstuvwxyz"
> def random_word(n):
>   return ''.join([random.choice(alphabet) for i in range(n)])
> 
> words = [random_word(3) for k in xrange(1000000)]
> d = {}
> 
> a = time.time()
> for w in words:
>   if w in d:
>     d[w] = d[w] + 1
>   else:
>     d[w] = 1
> b = time.time()
> print b-a, 'seconds'
> 
> 
> The Racket example yields running times of around 500 ms (running on Racket
> v6.0.1) while the Python example yields running times of around 330 ms
> (running on Python 2.7.3).
> 
> I find this unusual because Python is somewhat more dynamic than Racket,
> since
> (a) Python's equality and hashing functions have to dispatched at runtime
> for each key;
> (b) referencing and setting values in a Python dict is done using a very
> general operator, [], whose behaviour also has to be dispatched at runtime,
> unlike the more specific hash-ref and hash-set! Racket functions.
> 
> Is there something I'm missing about Racket's hash tables which explains
> this slower speed?
> 
> Thanks,
> Pedro Ramos
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev

Posted on the dev mailing list.