[racket] Extremely slow reading a hash
Sorry: my advice was bad. It probably is the mutable->immutable problem.
Below is a snippet of code you might find useful. Also, if your hashes
are embedded in some data structure, you might try wrapping them in a
struct and then using custom readers and writers on that struct to
avoid having to reimplement more of read.
Robby
#lang racket
(define (wr a-hash port)
(for ([(k v) (in-hash a-hash)])
(fprintf port "~s ~s\n" k v)))
(define (rd port)
(define hash (make-hash))
(let loop ()
(let ([next (read port)])
(cond
[(eof-object? next) hash]
[else
(hash-set! hash next (read port))
(loop)]))))
;; hash -> boolean
;; sees if rd/wr work together properly
(define (try-it hash)
(define sp (open-output-string))
(wr hash sp)
(equal? hash (rd (open-input-string (get-output-string sp)))))
(try-it (make-hash))
(let ([h (make-hash)])
(hash-set! h 'x 'y)
(try-it h))
(let ([h (make-hash)])
(hash-set! h 'x 'y)
(hash-set! h 11 12)
(try-it h))
On Wed, Mar 9, 2011 at 7:19 AM, Jeremy Price <donomii at gmail.com> wrote:
> I tried it with a more recent version (Welcome to Racket v5.1.), using
> my original code and the fasl calls. In both cases, reading in the
> hash caused racket to quit with an Out of Virtual Memory error, even
> though the hash had been created by racket (the reader program was run
> separately, I didn't accidentally try to hold the same hash twice).
>
> I'll fall back to the custom file format, especially since I see a
> planet library for sorting files too large to hold in memory.
>
> Thanks.
>
> On 9 March 2011 15:39, Eli Barzilay <eli at barzilay.org> wrote:
>> Earlier today, Jeremy Price wrote:
>>>
>>> I saved and loaded using:
>>>
>>> (with-output-to-file "/home/user/bayes-data" (lambda () (write
>>> trained_categories) ))
>>> (with-input-from-file "/home/user/bayes-data" (lambda () (set!
>>> trained_categories (read))))
>>
>> When you read a hash table you get an immutable one, which is
>> intenally very different from mutable ones. I don't know the details,
>> but it might be that it's less efficient when it gets to a huge table
>> (IIRC, it's implemented as a balanced tree?). But even if it wasn't,
>> it might be that the reader is reading the list and then making a
>> hash table out of it, which can be a problem at such sizes. So it's
>> probably better to just write the entries one by one, and on reading
>> it write some small code that will read them and put them in a new
>> table.
>>
>> There's also the `racket/fasl' module (`scheme/fasl' in your version),
>> which might be worth trying, but since it also leads to getting an
>> immutable hash it might have the same problem.
>>
>> Four minutes ago, Robby Findler wrote:
>>> I don't know if this has been fixed or not, but it is not difficult
>>> to try a more recent version. You can build it in your own home
>>> directory pretty easily once you get the basic source development
>>> packages installed. Under linux this is a pretty painless progress
>>> so I'd encourage you to give it a go.
>>
>> There's no need to compile -- just grab the ubuntu installer from
>> racket-lang.org. It's a simple .sh installer -- just run it, and for
>> the first question (use a unix-style distribution) answer "no".
>> You'll get the complete racket installed in a single directory that
>> you can keep in your home directory or moved around, and if you want
>> to uninstall it, you just remove the whole thing.
>>
>> --
>> ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
>> http://barzilay.org/ Maze is Life!
>>
>