[plt-scheme] Natural Language parsing in CS1

From: Todd O'Bryan (toddobryan at gmail.com)
Date: Mon Jun 1 09:14:47 EDT 2009

 I'm starting to think about my courses for next year, and one thing
that I really want to do with my intro classes is introduce topics
that require the same kinds of thinking as programming, but feel
different enough that students will see them as something different.

For example, I think I'm going to do a short unit on finite state
automata and regular expressions (and I'll use something like the
macros Shriram presents in his "The Swine Before Perl" talk
http://www.cs.brown.edu/~sk/Publications/Talks/SwineBeforePerl/ ).

I also thought I could do a short unit on parsing, in order to
introduce the idea of context-free grammars. What I'd like to do is
create a teachpack that would take code like this:

(define-grammar G
 ((S -> NP VP)
  (NP -> "Bob")
  (VP -> "ate")))

and then

(parse G S "Bob ate")

would produce a parse tree

(S (NP ("Bob"))
   (VP ("ate")))

and ideally, I could write a function that would display this as text
(this looks good in monospace):

NP        VP
|         |
"Bob"    "ate"

and maybe even display the tree graphically.

Eventually, I'd like to do some kind of features:

(define-grammar G
  ((S -> NP[number: X] VP[number: X])
   (NP[number: sing] -> "he")
   (NP[number: plur] -> "they")
   (VP[number: sing] -> "sleeps")
   (VP[number: plur] -> "sleep")))

So that "He sleeps" and "They sleep" parse, but *"They sleeps" and
*"He sleep" don't.

So, the question...

Where do I start? I can write a parser that produces a list of all
possible parses given a grammar and a list of terminals, but I'm not
sure how to approach this in Scheme. After the recent comment that OO
doesn't scale down, I'm wondering if Grammar and Tree are worth my
first attempt at classes in Scheme. I think I should probably use
strings for the terminals, but should I use symbols for the symbols of
the grammar or is there something built in that would handle
attribute-value pairs nicely if I get that far?

Any suggestions appreciated. I'll try to publish back what I come up
with, along with whatever assignments I come up with.


Posted on the users mailing list.