[racket-dev] Racket hash tables vs. Python dicts - Performance

From: Jake Eakle (jseakle at gmail.com)
Date: Fri Jul 25 08:39:40 EDT 2014

This is probably not responsible for much of the performance difference,
but the racket code has (random 23) where it looks like it ought to be
(random 26), resulting in a smaller space of possible random keys. This
will cause the branch with the lookup to happen somewhat more often than in
the python version.


On Fri, Jul 25, 2014 at 1:34 AM, Laurent <laurent.orseau at gmail.com> wrote:

> Even more idiomatic:
> (time (for ([w words])
>         (hash-update! d w add1 0)))
>
> But it seems to be slightly slower that Robby's version.
>
> Laurent
>
>
> On Thu, Jul 24, 2014 at 11:26 AM, Robby Findler <
> robby at eecs.northwestern.edu> wrote:
>
>> Not an answer to your direction question, but this is the more
>> idiomatic way to write that and it seems to be a bit faster:
>>
>> (time (for ([w (in-list words)])
>>         (hash-set! d w (add1 (hash-ref d w 0)))))
>>
>> On Wed, Jul 23, 2014 at 10:54 AM, Pedro Ramos
>> <pedropramos at tecnico.ulisboa.pt> wrote:
>> > Hi,
>> >
>> > I've been developing an implementation of Python in Racket, where I'm
>> > implementing Python's dictionaries over Racket custom hash tables.
>> >
>> > While my occasional benchmarks typically show better performance on
>> Racket
>> > programs than their Python equivalents, Racket's hash tables generally
>> seem
>> > to be slower than Python's dicts.
>> >
>> > I've set up this benchmark in Racket:
>> >
>> >
>> > #lang racket
>> >
>> > (define alphabet "abcdefghijklmnopqrstuvwxyz")
>> > (define (random-word n)
>> >   (build-string n (lambda (x) (string-ref alphabet (random 23)))))
>> >
>> > (define words (for/list ([k 1000000])
>> >                 (random-word 3)))
>> > (define d (make-hash))
>> >
>> > (time (for ([w words])
>> >         (if (hash-has-key? d w)
>> >             (hash-set! d w (add1 (hash-ref d w)))
>> >             (hash-set! d w 1))))
>> >
>> >
>> > And its equivalent in Python:
>> >
>> >
>> > import random
>> > import time
>> >
>> > alphabet = "abcdefghijklmnopqrstuvwxyz"
>> > def random_word(n):
>> >   return ''.join([random.choice(alphabet) for i in range(n)])
>> >
>> > words = [random_word(3) for k in xrange(1000000)]
>> > d = {}
>> >
>> > a = time.time()
>> > for w in words:
>> >   if w in d:
>> >     d[w] = d[w] + 1
>> >   else:
>> >     d[w] = 1
>> > b = time.time()
>> > print b-a, 'seconds'
>> >
>> >
>> > The Racket example yields running times of around 500 ms (running on
>> Racket
>> > v6.0.1) while the Python example yields running times of around 330 ms
>> > (running on Python 2.7.3).
>> >
>> > I find this unusual because Python is somewhat more dynamic than Racket,
>> > since
>> > (a) Python's equality and hashing functions have to dispatched at
>> runtime
>> > for each key;
>> > (b) referencing and setting values in a Python dict is done using a very
>> > general operator, [], whose behaviour also has to be dispatched at
>> runtime,
>> > unlike the more specific hash-ref and hash-set! Racket functions.
>> >
>> > Is there something I'm missing about Racket's hash tables which explains
>> > this slower speed?
>> >
>> > Thanks,
>> > Pedro Ramos
>> >
>> > _________________________
>> >   Racket Developers list:
>> >   http://lists.racket-lang.org/dev
>> >
>> _________________________
>>   Racket Developers list:
>>   http://lists.racket-lang.org/dev
>>
>
>
> _________________________
>   Racket Developers list:
>   http://lists.racket-lang.org/dev
>
>


-- 
A warb degombs the brangy. Your gitch zanks and leils the warb.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20140725/bb32f173/attachment.html>

Posted on the dev mailing list.