[racket-dev] Racket hash tables vs. Python dicts - Performance

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu Jul 24 09:34:17 EDT 2014

Well, I couldn't leave this alone --- especially after I realized that
some of the dispatching overhead has to do with C functions that
cooperate specially with the GC for more complex cases.

I've streamlined various paths and cut about 1/3 of Racket's time on
the example.

At Thu, 24 Jul 2014 11:09:11 +0100, Matthew Flatt wrote:
> In Racket, it looks like a lot of the time is still in generic
> dispatch: directing hash operations to `equal?`-based hashing,
> directing `equal?`-based hashing operation to the string-specific
> operation, directing equality comparison to string-specific equality
> comparison, and so on.
> 
> I'm not sure if that's the full story, but my guess is that such paths
> are more streamlined in Python, given the central role that
> dictionaries play there.
> 
> At Wed, 23 Jul 2014 16:54:47 +0100, Pedro Ramos wrote:
> > Hi,
> > 
> > I've been developing an implementation of Python in Racket, where I'm
> > implementing Python's dictionaries over Racket custom hash tables.
> > 
> > While my occasional benchmarks typically show better performance on Racket
> > programs than their Python equivalents, Racket's hash tables generally seem
> > to be slower than Python's dicts.
> > 
> > I've set up this benchmark in Racket:
> > 
> > 
> > #lang racket
> > 
> > (define alphabet "abcdefghijklmnopqrstuvwxyz")
> > (define (random-word n)
> >   (build-string n (lambda (x) (string-ref alphabet (random 23)))))
> > 
> > (define words (for/list ([k 1000000])
> >                 (random-word 3)))
> > (define d (make-hash))
> > 
> > (time (for ([w words])
> >         (if (hash-has-key? d w)
> >             (hash-set! d w (add1 (hash-ref d w)))
> >             (hash-set! d w 1))))
> > 
> > 
> > And its equivalent in Python:
> > 
> > 
> > import random
> > import time
> > 
> > alphabet = "abcdefghijklmnopqrstuvwxyz"
> > def random_word(n):
> >   return ''.join([random.choice(alphabet) for i in range(n)])
> > 
> > words = [random_word(3) for k in xrange(1000000)]
> > d = {}
> > 
> > a = time.time()
> > for w in words:
> >   if w in d:
> >     d[w] = d[w] + 1
> >   else:
> >     d[w] = 1
> > b = time.time()
> > print b-a, 'seconds'
> > 
> > 
> > The Racket example yields running times of around 500 ms (running on Racket
> > v6.0.1) while the Python example yields running times of around 330 ms
> > (running on Python 2.7.3).
> > 
> > I find this unusual because Python is somewhat more dynamic than Racket,
> > since
> > (a) Python's equality and hashing functions have to dispatched at runtime
> > for each key;
> > (b) referencing and setting values in a Python dict is done using a very
> > general operator, [], whose behaviour also has to be dispatched at runtime,
> > unlike the more specific hash-ref and hash-set! Racket functions.
> > 
> > Is there something I'm missing about Racket's hash tables which explains
> > this slower speed?
> > 
> > Thanks,
> > Pedro Ramos
> > _________________________
> >   Racket Developers list:
> >   http://lists.racket-lang.org/dev
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev

Posted on the dev mailing list.