[racket] Extremely slow reading a hash

From: Jeremy Price (donomii at gmail.com)
Date: Tue Mar 8 03:32:34 EST 2011

Hi,
I've been working on a Bayes file categoriser, which requires scanning
files and putting a (very) large number of keys into a hash.  In an
attempt to speed things up, I tried to save the hash to disk and then
read it back later.  Creating and saving the hash took less than 5
minutes, reading it from disk didn't finish after 10 hours.  It seems
like I may have missed a switch or option, but reading the manual for
(read) and Data Types doesn't give me any clues.  What did I miss?

I saved and loaded using:

(with-output-to-file "/home/user/bayes-data" (lambda () (write
trained_categories) ))
(with-input-from-file "/home/user/bayes-data" (lambda () (set!
trained_categories (read))))


The saved hash has a size of 167M and looks like:
#hash( ("files" . #hash((#"\0\0\0\324" . 1) (#"\6\372\370\233" . 1) ... )  ... )

and I'm using:
Welcome to MzScheme v4.2.4 [3m], Copyright (c) 2004-2010 PLT Scheme Inc.

That version is the latest Ubuntu is shipping, so apologies if the
problem has already been addressed.


Posted on the users mailing list.