[racket] Question about hash table efficiency

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu Feb 14 15:31:04 EST 2013

Last time I looked at that benchmark, I changed the hash table to hold
boxed values, so that the updating `hash-ref' plus `hash-set!' could be
replaced with `hash-ref', `unbox' (which is fast), and `set-box!'
(which is also fast).

To confirm other comments: `hash-update' is implemented as just
`hash-ref' plus `hash-set', and so it's a little faster when you
effectively inline the function --- but the idea was that `hash-update'
might be made more built-in and faster, one day.

At Thu, 14 Feb 2013 14:20:35 -0500, James Bergstra wrote:
> I was actually trying to understand the performance bottlenecks of the
> "knucleotide" challenge of the language shootout:
> 
> http://benchmarksgame.alioth.debian.org/u64/program.php?test=knucleotide&lang=ra
> cket&id=4
> 
> Most of the time seems to be spent in the loop that builds the hash table.
> 
> (Incidentally, in that same loop, I tried replacing the set! key with
> a for/fold and that helped on my home computer but not the one at
> work. Maybe I had different versions of racket, I'm not sure)
> 
> - James
> 
> On Thu, Feb 14, 2013 at 2:15 PM, Robby Findler
> <robby at eecs.northwestern.edu> wrote:
> > hash-update is defined in Racket and, besides the error checking, is
> > basically the composition of those two functions. I expect it was written as
> > something to abstract over a common pattern, not speed things up.
> >
> > Is this a bottleneck in some real program? Perhaps you'd share that (or some
> > inner loop of it)? I wonder if there is a better way to speed it up than
> > this.
> >
> > Robby
> >
> >
> > On Thu, Feb 14, 2013 at 1:09 PM, J. Ian Johnson <ianj at ccs.neu.edu> wrote:
> >>
> >> If I am right, then it would make sense to use hash-update when the hash
> >> is big enough that a log-time traversal takes longer than a general
> >> procedure call's set-up and tear-down.
> >> -Ian
> >> ----- Original Message -----
> >> From: James Bergstra <james.bergstra at gmail.com>
> >> To: J. Ian Johnson <ianj at ccs.neu.edu>
> >> Cc: users at racket-lang.org
> >> Sent: Thu, 14 Feb 2013 13:47:51 -0500 (EST)
> >> Subject: Re: [racket] Question about hash table efficiency
> >>
> >> Thanks, I suppose I just had my hopes up. I thought the more compact
> >> form might involve fewer expression evaluations and require just one
> >> hash lookup instead of two, and therefore might run something like
> >> twice as fast, rather than slightly slower.
> >>
> >> On Thu, Feb 14, 2013 at 1:36 PM, J. Ian Johnson <ianj at ccs.neu.edu> wrote:
> >> > My guess would be the higher-orderness on hash-update causes the (minor)
> >> > slowdown.
> >> > -Ian
> >> > ----- Original Message -----
> >> > From: James Bergstra <james.bergstra at gmail.com>
> >> > To: users at racket-lang.org
> >> > Sent: Thu, 14 Feb 2013 13:16:47 -0500 (EST)
> >> > Subject: [racket] Question about hash table efficiency
> >> >
> >> > Hi list, just discovered racket and having fun learning about scheme
> >> > again. Haven't used it since undergrad. Thanks!
> >> >
> >> > I was wondering about the efficiency of hash-update vs. hash-set and
> >> > hash-ref. I thought the former would be faster, otherwise why have it?
> >> > A little benchmark script reveals the same surprising result as my
> >> > actual application though:
> >> >
> >> > """
> >> > #lang racket
> >> >
> >> > ;; warm-up
> >> > (time (void
> >> >  (for/fold ([table (make-immutable-hasheq '())])
> >> >   ([ii (in-range 100000)])
> >> >   (hash-update table (modulo ii 100) add1 ii))))
> >> >
> >> > ;; update
> >> > (newline)
> >> > (time (void
> >> >  (for/fold ([table (make-immutable-hasheq '())])
> >> >   ([ii (in-range 100000)])
> >> >   (hash-update table (modulo ii 100) add1 ii))))
> >> >
> >> > ;; set/ref
> >> > (newline)
> >> > (time (void
> >> >  (for/fold ([table (make-immutable-hasheq '())])
> >> >   ([ii (in-range 100000)])
> >> >   (hash-set table (modulo ii 100) (add1 (hash-ref table (modulo ii 100)
> >> > ii))))))
> >> >
> >> > """
> >> >
> >> > This prints out:
> >> >
> >> > cpu time: 112 real time: 114 gc time: 32
> >> >
> >> > cpu time: 80 real time: 82 gc time: 0
> >> >
> >> > cpu time: 76 real time: 75 gc time: 4
> >> >
> >> >
> >> > My original application used mutable hash tables and the same effect
> >> > was observed (if not more dramatically). Any thoughts on why the
> >> > update form is slower? I'm running this in Racket v5.1.1.
> >> >
> >> > - James
> >> > ____________________
> >> >   Racket Users list:
> >> >   http://lists.racket-lang.org/users
> >> >
> >>
> >> ____________________
> >>   Racket Users list:
> >>   http://lists.racket-lang.org/users
> >
> >
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users

Posted on the users mailing list.