[plt-scheme] (fast) reading of data files into a hash-table - how?

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sat Dec 31 16:10:58 EST 2005

At Sat, 31 Dec 2005 19:03:27 +0200, Yoav Goldberg wrote:
> I need to load a large amount of precomputed data.
> It is bassically a list of key:value pairs, where each key is a list
> of symbols and each value is a number. I have about 14000 such keys.
> 
> I tried to just read the data line by line and put it into a
> hash-table, but it turns out creating a hash-table with 'equal, and
> then filling it with 14000 values takes way to much processing time.
> Is there any way to speed things up?

As I tried to answer later parts of this thread, I started to wonder
how hashing a mere 14000 keys could be the bottleneck.

The answer is that `equal-hash-code' was broken for symbols. It always
returned the same number.

This is now fixed in SVN. Hashing 14000 lists of 6 randomly generated
symbols now takes about 200 msec on my machine. (Generating the
interned symbols takes about 600 msec.)

My guess is that your program will run fine, now, but let me know if
not.

Matthew



Posted on the users mailing list.