[racket] Extremely slow reading a hash

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Mar 9 08:56:35 EST 2011

It sounds like something is wrong. Can you send us code to replicate
the problem?

Here's my attempt:

 #lang scheme

 (define N 1000000)

 (define ht (make-hash))
  (for ([i (in-range N)])
    (let ([b (bytes (random 256) (random 256) (random 256) (random 256))])
      (hash-set! ht b i))))

 (define-values (in out) (make-pipe))

 (write ht out)
 (define ht2 (time (read in)))
 (file-position in)

 (hash-count ht)
 (hash-count ht2)

It takes about 2 seconds to create the table and 16 seconds to read the
table back in. The "file" created in this example is about 25MB --- not
as big as 167MB, but within a factor of 10.

So, my example is still missing some crucial ingredient. I wonder, for
example, if your hash table might map multiple keys to the same value,
so that printing the hash table prints the value multiple times. Or
maybe there are cycles in the hash table that are not handled correctly
when reading.

At Thu, 10 Mar 2011 00:23:08 +1100, Jeremy Price wrote:
> I tried it with a more recent version (Welcome to Racket v5.1.), using
> my original code and the fasl calls.  In both cases, reading in the
> hash caused racket to quit with an Out of Virtual Memory error, even
> though the hash had been created by racket (the reader program was run
> separately, I didn't accidentally try to hold the same hash twice).
> I'll fall back to the custom file format, especially since I see a
> planet library for sorting files too large to hold in memory.
> Thanks.
> On 9 March 2011 15:39, Eli Barzilay <eli at barzilay.org> wrote:
> > Earlier today, Jeremy Price wrote:
> >>
> >> I saved and loaded using:
> >>
> >> (with-output-to-file "/home/user/bayes-data" (lambda () (write
> >> trained_categories) ))
> >> (with-input-from-file "/home/user/bayes-data" (lambda () (set!
> >> trained_categories (read))))
> >
> > When you read a hash table you get an immutable one, which is
> > intenally very different from mutable ones.  I don't know the details,
> > but it might be that it's less efficient when it gets to a huge table
> > (IIRC, it's implemented as a balanced tree?).  But even if it wasn't,
> > it might be that the reader is reading the list and then making a
> > hash table out of it, which can be a problem at such sizes.  So it's
> > probably better to just write the entries one by one, and on reading
> > it write some small code that will read them and put them in a new
> > table.
> >
> > There's also the `racket/fasl' module (`scheme/fasl' in your version),
> > which might be worth trying, but since it also leads to getting an
> > immutable hash it might have the same problem.
> >
> > Four minutes ago, Robby Findler wrote:
> >> I don't know if this has been fixed or not, but it is not difficult
> >> to try a more recent version. You can build it in your own home
> >> directory pretty easily once you get the basic source development
> >> packages installed. Under linux this is a pretty painless progress
> >> so I'd encourage you to give it a go.
> >
> > There's no need to compile -- just grab the ubuntu installer from
> > racket-lang.org.  It's a simple .sh installer -- just run it, and for
> > the first question (use a unix-style distribution) answer "no".
> > You'll get the complete racket installed in a single directory that
> > you can keep in your home directory or moved around, and if you want
> > to uninstall it, you just remove the whole thing.
> >
> > --
> >          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
> >                    http://barzilay.org/                   Maze is Life!
> >
> _________________________________________________
>   For list-related administrative tasks:
>   http://lists.racket-lang.org/listinfo/users

Posted on the users mailing list.