[plt-scheme] Natural Language parsing in CS1
On Jun 1, Todd O'Bryan wrote:
>
> I also thought I could do a short unit on parsing, in order to
> introduce the idea of context-free grammars. What I'd like to do is
> create a teachpack that would take code like this:
>
> (define-grammar G
> ((S -> NP VP)
> (NP -> "Bob")
> (VP -> "ate")))
>
> and then
>
> (parse G S "Bob ate")
>
> would produce a parse tree
>
> (S (NP ("Bob"))
> (VP ("ate")))
> [...]
Just a quick warning on this: the world of natural language processing
has moved away from this kind of symbolic approach to parsing texts,
so it would be better to leave such exercises for a formal language
instead (like writing a parser for a mini language, for example).
Another thing that I did in the past that might be interesting is an
"ambiguous" construct that chooses a random sub-form to evaluate and
does not evaluate the rest. For example,
(define (foo)
(choose (list 'y)
(cons 'x (foo))))
would return a list of a random number of `x's, terminated by a `y'.
This is a different aspect of NLP -- text generation -- which
generally works better with such a symbolic approach.
> Eventually, I'd like to do some kind of features:
>
> (define-grammar G
> ((S -> NP[number: X] VP[number: X])
> (NP[number: sing] -> "he")
> (NP[number: plur] -> "they")
> (VP[number: sing] -> "sleeps")
> (VP[number: plur] -> "sleep")))
>
> So that "He sleeps" and "They sleep" parse, but *"They sleeps" and
> *"He sleep" don't.
If you do want to go with this, then a good place to start would be to
look at a system called "Fuf", which was written by Michael Elhadad a
while back. (That system was written in Lisp, but the code should be
very "schemy".)
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!