[plt-scheme] reading a whole file

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Nov 4 14:28:36 EST 2008

On Nov  4, Stephen De Gabrielle wrote:
> I'm working with the Enron email collection, uncompressed it is 2.54
> Gb(across 500k files) , so it should be possible to play with the
> whole thing in RAM.

Just in case you plan to actually do that: at these sizes multipler
factors become things that you should be aware of:

* In general, the GC requires more memory than you actually use.  I
  think that generally speaking you should plan on it holding twice
  the ram that you actually need.  (Even though it can be smaller with
  generations.)

* MzScheme holds strings in UCS-4 format, so each character is 4
  bytes.

In other words, you might need around 20gb of ram just to read it all
in.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.