[plt-scheme] (fast) reading of data files into a hash-table - how?

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sat Dec 31 16:39:33 EST 2005

At Sat, 31 Dec 2005 21:29:43 +0200, Yoav Goldberg wrote:
> On 12/31/05, Chongkai Zhu <mathematica at citiz.net> wrote:
> > If your hash table doesn't needs any modification, edit a source file as:
> >
> > (module a-hash-table mzscheme
> >   (provide a-hash-table)
> >   (define a-hash-table
> >     (make-immutable-hash-table
> >      ;your data line by line here
> >      ;in assoc-list form
> >      'equal)))
> >
> > Then compile this file. Every time you need the hash-table, require the
> > compiled module.
> 
> I tried something quite similar:
> (module data mzscheme
>    (require (lib "etc.ss"))
>    (provide get-data)
>    (define *hashtable*
>        (hash-table 'equal
>        ;; my data
>        ))
>    (define (get-data k) (hash-table-get ...) )))
> 
> And then tried to compile it (using mzc --extension --auto-dir data.scm)
> mzc worked for several minutes(!) producing a ~600k .dll file, and
> then I tried to require "data.scm", and got a windows runtime error...
> 
> Any idea why?

Your code expands to an expression with 28000 sub-expressions in it for
the data. That translates to a very large C function, since mzc doesn't
work hard to break it up. Maybe mzc should do better, but hopefully
yhis isn't an issue for you anymore.

With `make-immutable-hash-table', in contrast, the data gets quoted,
and mzc turns the quoted data into bytecode. With my randomly generated
6-symbol keys, that still works out to about 600k of bytecode, but I'll
bet your lists are less random.

Matthew



Posted on the users mailing list.