You're right. Even if I partition my data (say 2 gb chunks) I'm probably not that much faster than disk. (based on robby's data)<br>I think I better start reading the ports library docs. (or stick to document sets <100mb)<br>
<br>s.<br><br><br><div class="gmail_quote">On Tue, Nov 4, 2008 at 7:28 PM, Eli Barzilay <span dir="ltr"><<a href="mailto:eli@barzilay.org" target="_blank">eli@barzilay.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>On Nov 4, Stephen De Gabrielle wrote:<br>
> I'm working with the Enron email collection, uncompressed it is 2.54<br>
> Gb(across 500k files) , so it should be possible to play with the<br>
> whole thing in RAM.<br>
<br>
</div>Just in case you plan to actually do that: at these sizes multipler<br>
factors become things that you should be aware of:<br>
<br>
* In general, the GC requires more memory than you actually use. I<br>
think that generally speaking you should plan on it holding twice<br>
the ram that you actually need. (Even though it can be smaller with<br>
generations.)<br>
<br>
* MzScheme holds strings in UCS-4 format, so each character is 4<br>
bytes.<br>
<br>
In other words, you might need around 20gb of ram just to read it all<br>
in.<br>
<div><br>
--<br>
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:<br>
<a href="http://www.barzilay.org/" target="_blank">http://www.barzilay.org/</a> Maze is Life!<br>
_________________________________________________<br>
</div><div><div></div><div> For list-related administrative tasks:<br>
<a href="http://list.cs.brown.edu/mailman/listinfo/plt-scheme" target="_blank">http://list.cs.brown.edu/mailman/listinfo/plt-scheme</a><br>
</div></div></blockquote></div><br>