[racket] Need some help for my first real experiment with scheme
On Mon, Apr 23, 2012 at 10:18 AM, Pedro <pedro100 at gmail.com> wrote:
> Ok, thank you all for the input, however I'm still missing an important detail.
> So I build a suffix tree, but how exactly do I refer to the target documents?
> Should I tie a reference to each document in which the string occurs
> to each node? I can't think of other way to do it.
One way to do this is keep a reference to the node at the end of the string.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
#lang racket
(require (planet dyoo/suffixtree))
;; String reference
(struct sref (str))
;; string->label*: string -> label
;; Creates a label out of a string, but with a reference
;; to the string at the end.
(define (string->label* s)
(vector->label
(list->vector (append (string->list s) (list (sref s))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Once we have these special characters at the end, we have a way to
cull them when we search through a tree. The leaves of our tree will
contain those special sref values, and we can then use them to recover
the matched strings:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; leaves: node -> (listof node)
;; Get the leaves of the tree rooted at node.
(define (leaves a-node)
(let loop ([a-node a-node]
[acc '()])
(define children (node-children a-node))
(cond [(eq? children '())
(cons a-node acc)]
[else
(foldl loop acc children)])))
;; leaf-str: node -> string
;; Given a leaf node, get back the string stored in the
;; terminating sref element.
(define (leaf-str a-leaf)
(define a-label (node-up-label a-leaf))
(sref-str (label-ref a-label (sub1 (label-length a-label)))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
When we walk the tree, if we find a matching node, we then can walk
the children to the leaves.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; find-matches: tree string -> (listof string)
(define (find-matches a-tree str)
(define (on-success node up-label-offset)
(map leaf-str (leaves node)))
(define (on-fail node up-label-offset input-label-offset)
'())
(tree-walk a-tree
(string->label str)
on-success
on-fail))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
For example:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(define a-tree (make-tree))
(tree-add! a-tree (string->label* "peter"))
(tree-add! a-tree (string->label* "piper"))
(tree-add! a-tree (string->label* "pepper"))
(find-matches a-tree "er")
(find-matches a-tree "per")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Note that find-matches isn't checking for duplicates, so if we search
for the single string "e", we'll see several matches because there are
several suffixes of the keywords that contain "e".