[plt-scheme] Natural Language parsing in CS1

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Jun 1 09:56:14 EDT 2009

On Jun  1, Todd O'Bryan wrote:
> 
> I also thought I could do a short unit on parsing, in order to
> introduce the idea of context-free grammars. What I'd like to do is
> create a teachpack that would take code like this:
> 
> (define-grammar G
>  ((S -> NP VP)
>   (NP -> "Bob")
>   (VP -> "ate")))
> 
> and then
> 
> (parse G S "Bob ate")
> 
> would produce a parse tree
> 
> (S (NP ("Bob"))
>    (VP ("ate")))
> [...]

Just a quick warning on this: the world of natural language processing
has moved away from this kind of symbolic approach to parsing texts,
so it would be better to leave such exercises for a formal language
instead (like writing a parser for a mini language, for example).
Another thing that I did in the past that might be interesting is an
"ambiguous" construct that chooses a random sub-form to evaluate and
does not evaluate the rest.  For example,

  (define (foo)
    (choose (list 'y)
            (cons 'x (foo))))

would return a list of a random number of `x's, terminated by a `y'.
This is a different aspect of NLP -- text generation -- which
generally works better with such a symbolic approach.


> Eventually, I'd like to do some kind of features:
> 
> (define-grammar G
>   ((S -> NP[number: X] VP[number: X])
>    (NP[number: sing] -> "he")
>    (NP[number: plur] -> "they")
>    (VP[number: sing] -> "sleeps")
>    (VP[number: plur] -> "sleep")))
> 
> So that "He sleeps" and "They sleep" parse, but *"They sleeps" and
> *"He sleep" don't.

If you do want to go with this, then a good place to start would be to
look at a system called "Fuf", which was written by Michael Elhadad a
while back.  (That system was written in Lisp, but the code should be
very "schemy".)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!


Posted on the users mailing list.