[plt-scheme] Natural Language parsing in CS1
Todd wrote:
> I think I'm going to do a short unit on finite state
> automata and regular expressions (and I'll use something like the
> macros Shriram presents in his "The Swine Before Perl" talk
> http://www.cs.brown.edu/~sk/Publications/Talks/SwineBeforePerl/ ).
May I also suggest the "derivatives" approach to regular expressions? It
dates back to 1961, but is nicely described in a paper by Owens, Reppy,
and Turon in the March 2009 issue of the Journal of Functional
Programming (Owens used this approach in the parser-tools library that
ships with PLT Scheme). The idea is simple: a regular expression is a
tree. So write a function that consumes a regular expression E and a
character c, and produces the regular expression E' describing all
strings described by E that start with c. It's easy to program in Scheme
(at least a basic version without lots of efficiency hacks).
> I also thought I could do a short unit on parsing, in order to
> introduce the idea of context-free grammars.
Recursive-descent parsing with just recognition is not bad, though you
have to worry about left recursion. If you want to produce the parse
tree or trees, or produce meaningful error messages, you need to develop
some machinery. It may be too much in an introductory course. In our
second course, I show them how to implement Scheme's `read' (for lists
of symbols and natural numbers), and then point out how it can be
generalized to parse XML and (well-formed) HTML. That might be another
direction to pursue.
I'll send you some slides from a (too-jam-packed) talk on functional
lexing and parsing I gave to the Computer Science Club here at UW. (The
slides are available on their server, but it seems to be down at the
moment.) --PR