[plt-scheme] reading a whole file

Tue Nov 4 15:03:03 EST 2008

You're right. Even if I partition my data (say 2 gb chunks) I'm probably not
that much faster than disk. (based on robby's data)
I think I better start reading the ports library docs. (or stick to document
sets <100mb)

s.

On Tue, Nov 4, 2008 at 7:28 PM, Eli Barzilay <eli at barzilay.org> wrote:

> On Nov  4, Stephen De Gabrielle wrote:
> > I'm working with the Enron email collection, uncompressed it is 2.54
> > Gb(across 500k files) , so it should be possible to play with the
> > whole thing in RAM.
>
> Just in case you plan to actually do that: at these sizes multipler
> factors become things that you should be aware of:
>
> * In general, the GC requires more memory than you actually use.  I
>  think that generally speaking you should plan on it holding twice
>  the ram that you actually need.  (Even though it can be smaller with
>  generations.)
>
> * MzScheme holds strings in UCS-4 format, so each character is 4
>  bytes.
>
> In other words, you might need around 20gb of ram just to read it all
> in.
>
> --
>          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                  http://www.barzilay.org/                 Maze is Life!
> _________________________________________________
>   For list-related administrative tasks:
>  http://list.cs.brown.edu/mailman/listinfo/plt-scheme
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20081104/afbf46cd/attachment.html>