[racket] Extremely slow reading a hash

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Wed Mar 9 08:40:13 EST 2011

Sorry: my advice was bad. It probably is the mutable->immutable problem.

Below is a snippet of code you might find useful. Also, if your hashes
are embedded in some data structure, you might try wrapping them in a
struct and then using custom readers and writers on that struct to
avoid having to reimplement more of read.

Robby

#lang racket
(define (wr a-hash port)
  (for ([(k v) (in-hash a-hash)])
    (fprintf port "~s ~s\n" k v)))

(define (rd port)
  (define hash (make-hash))
  (let loop ()
    (let ([next (read port)])
      (cond
        [(eof-object? next) hash]
        [else
         (hash-set! hash next (read port))
         (loop)]))))



;; hash -> boolean
;; sees if rd/wr work together properly
(define (try-it hash)
  (define sp (open-output-string))
  (wr hash sp)
  (equal? hash (rd (open-input-string (get-output-string sp)))))

(try-it (make-hash))

(let ([h (make-hash)])
  (hash-set! h 'x 'y)
  (try-it h))

(let ([h (make-hash)])
  (hash-set! h 'x 'y)
  (hash-set! h 11 12)
  (try-it h))

On Wed, Mar 9, 2011 at 7:19 AM, Jeremy Price <donomii at gmail.com> wrote:
> I tried it with a more recent version (Welcome to Racket v5.1.), using
> my original code and the fasl calls.  In both cases, reading in the
> hash caused racket to quit with an Out of Virtual Memory error, even
> though the hash had been created by racket (the reader program was run
> separately, I didn't accidentally try to hold the same hash twice).
>
> I'll fall back to the custom file format, especially since I see a
> planet library for sorting files too large to hold in memory.
>
> Thanks.
>
> On 9 March 2011 15:39, Eli Barzilay <eli at barzilay.org> wrote:
>> Earlier today, Jeremy Price wrote:
>>>
>>> I saved and loaded using:
>>>
>>> (with-output-to-file "/home/user/bayes-data" (lambda () (write
>>> trained_categories) ))
>>> (with-input-from-file "/home/user/bayes-data" (lambda () (set!
>>> trained_categories (read))))
>>
>> When you read a hash table you get an immutable one, which is
>> intenally very different from mutable ones.  I don't know the details,
>> but it might be that it's less efficient when it gets to a huge table
>> (IIRC, it's implemented as a balanced tree?).  But even if it wasn't,
>> it might be that the reader is reading the list and then making a
>> hash table out of it, which can be a problem at such sizes.  So it's
>> probably better to just write the entries one by one, and on reading
>> it write some small code that will read them and put them in a new
>> table.
>>
>> There's also the `racket/fasl' module (`scheme/fasl' in your version),
>> which might be worth trying, but since it also leads to getting an
>> immutable hash it might have the same problem.
>>
>> Four minutes ago, Robby Findler wrote:
>>> I don't know if this has been fixed or not, but it is not difficult
>>> to try a more recent version. You can build it in your own home
>>> directory pretty easily once you get the basic source development
>>> packages installed. Under linux this is a pretty painless progress
>>> so I'd encourage you to give it a go.
>>
>> There's no need to compile -- just grab the ubuntu installer from
>> racket-lang.org.  It's a simple .sh installer -- just run it, and for
>> the first question (use a unix-style distribution) answer "no".
>> You'll get the complete racket installed in a single directory that
>> you can keep in your home directory or moved around, and if you want
>> to uninstall it, you just remove the whole thing.
>>
>> --
>>          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>>                    http://barzilay.org/                   Maze is Life!
>>
>



Posted on the users mailing list.