[racket] Need some help for my first real experiment with scheme
On Wed, Apr 18, 2012 at 2:54 PM, Pedro <pedro100 at gmail.com> wrote:
> So to put it in a simple way, I need to tokenize all my data and
> create an index which I load into memory...?
> Is this how it is usually done? For example, does my browser (firefox)
> keep an index of all the words present in urls and page titles on
> memory at any given time?
Yes; features like "autocomplete" use data structures to index and
quickly retrieve matches. For example, tries or suffix trees allow
for quick searches given partial keywords. There's a question on
Stack Overflow that talks about this:
http://stackoverflow.com/questions/2426583/data-structure-for-auto-completion
I recently refreshed my suffix trees implementation with scribbled
documentation (http://planet.racket-lang.org/package-source/dyoo/suffixtree.plt/1/2/planet-docs/manual/index.html).
I do I need to make a revised release that does not produce a spurious
error message at compile-time; my apologies for that.
The library might come in handy. For example:
######################################
> (require (planet dyoo/suffixtree:1:2))
> (define tree (make-tree))
> (tree-add! tree (string->label "apple sauce"))
> (tree-add! tree (string->label "fruit juice"))
> (tree-add! tree (string->label "mango"))
######################################
At this point, we can ask the tree if a substring exists in any suffix
of the strings we've added to the tree:
######################################
> (tree-contains? tree (string->label "juic"))
#t
> (tree-contains? tree (string->label "moo"))
#f
######################################
We can augment the suffix tree structure so that we can pick out the
original terms that matched the given substring. It shouldn't be too
bad, though it'll take a little more work.