[racket] Need some help for my first real experiment with scheme

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Wed Apr 18 16:37:44 EDT 2012

Pedro wrote at 04/18/2012 02:54 PM:
> So to put it in a simple way, I need to tokenize all my data and
> create an index which I load into memory...?
>    

That's a simple way that might do everything you want.

If you do this, and then find you want it to work better, then I suggest 
hitting an IR textbook.

Regarding whether keeping everything in memory will work: You can do the 
arithmetic on how much memory you'll need, once you know how many terms 
and documents you need to support.  Then see whether you'll have enough 
free RAM for twice that number; if you're exhausting RAM and swapping 
GC'd virtual memory to disk randomly, you're going to have a bad time.

> Is this how it is usually done? For example, does my browser (firefox)
> keep an index of all the words present in urls and page titles on
> memory at any given time?
>    

I would guess so, though that might be indirectly, such as through an 
SQLite cache.

Neil V.

-- 
http://www.neilvandyke.org/

Posted on the users mailing list.