[racket-dev] Racket hash tables vs. Python dicts - Performance

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Thu Jul 24 05:26:51 EDT 2014

Not an answer to your direction question, but this is the more
idiomatic way to write that and it seems to be a bit faster:

(time (for ([w (in-list words)])
        (hash-set! d w (add1 (hash-ref d w 0)))))

On Wed, Jul 23, 2014 at 10:54 AM, Pedro Ramos
<pedropramos at tecnico.ulisboa.pt> wrote:
> Hi,
>
> I've been developing an implementation of Python in Racket, where I'm
> implementing Python's dictionaries over Racket custom hash tables.
>
> While my occasional benchmarks typically show better performance on Racket
> programs than their Python equivalents, Racket's hash tables generally seem
> to be slower than Python's dicts.
>
> I've set up this benchmark in Racket:
>
>
> #lang racket
>
> (define alphabet "abcdefghijklmnopqrstuvwxyz")
> (define (random-word n)
>   (build-string n (lambda (x) (string-ref alphabet (random 23)))))
>
> (define words (for/list ([k 1000000])
>                 (random-word 3)))
> (define d (make-hash))
>
> (time (for ([w words])
>         (if (hash-has-key? d w)
>             (hash-set! d w (add1 (hash-ref d w)))
>             (hash-set! d w 1))))
>
>
> And its equivalent in Python:
>
>
> import random
> import time
>
> alphabet = "abcdefghijklmnopqrstuvwxyz"
> def random_word(n):
>   return ''.join([random.choice(alphabet) for i in range(n)])
>
> words = [random_word(3) for k in xrange(1000000)]
> d = {}
>
> a = time.time()
> for w in words:
>   if w in d:
>     d[w] = d[w] + 1
>   else:
>     d[w] = 1
> b = time.time()
> print b-a, 'seconds'
>
>
> The Racket example yields running times of around 500 ms (running on Racket
> v6.0.1) while the Python example yields running times of around 330 ms
> (running on Python 2.7.3).
>
> I find this unusual because Python is somewhat more dynamic than Racket,
> since
> (a) Python's equality and hashing functions have to dispatched at runtime
> for each key;
> (b) referencing and setting values in a Python dict is done using a very
> general operator, [], whose behaviour also has to be dispatched at runtime,
> unlike the more specific hash-ref and hash-set! Racket functions.
>
> Is there something I'm missing about Racket's hash tables which explains
> this slower speed?
>
> Thanks,
> Pedro Ramos
>
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev
>

Posted on the dev mailing list.