[racket] Need some help for my first real experiment with scheme

From: Danny Yoo (dyoo at cs.wpi.edu)
Date: Wed Apr 18 16:09:15 EDT 2012

On Wed, Apr 18, 2012 at 2:54 PM, Pedro <pedro100 at gmail.com> wrote:
> So to put it in a simple way, I need to tokenize all my data and
> create an index which I load into memory...?
> Is this how it is usually done? For example, does my browser (firefox)
> keep an index of all the words present in urls and page titles on
> memory at any given time?

Yes; features like "autocomplete" use data structures to index and
quickly retrieve matches.  For example, tries or suffix trees allow
for quick searches given partial keywords.  There's a question on
Stack Overflow that talks about this:

http://stackoverflow.com/questions/2426583/data-structure-for-auto-completion


I recently refreshed my suffix trees implementation with scribbled
documentation (http://planet.racket-lang.org/package-source/dyoo/suffixtree.plt/1/2/planet-docs/manual/index.html).

I do I need to make a revised release that does not produce a spurious
error message at compile-time; my apologies for that.


The library might come in handy.  For example:

######################################
> (require (planet dyoo/suffixtree:1:2))
> (define tree (make-tree))
> (tree-add! tree (string->label "apple sauce"))
> (tree-add! tree (string->label "fruit juice"))
> (tree-add! tree (string->label "mango"))
######################################

At this point, we can ask the tree if a substring exists in any suffix
of the strings we've added to the tree:

######################################
> (tree-contains? tree (string->label "juic"))
#t
> (tree-contains? tree (string->label "moo"))
#f
######################################


We can augment the suffix tree structure so that we can pick out the
original terms that matched the given substring.  It shouldn't be too
bad, though it'll take a little more work.

Posted on the users mailing list.