[plt-scheme] Natural Language parsing in CS1

From: Prabhakar Ragde (plragde at uwaterloo.ca)
Date: Mon Jun 1 11:04:46 EDT 2009

Todd wrote:

> I think I'm going to do a short unit on finite state
> automata and regular expressions (and I'll use something like the
> macros Shriram presents in his "The Swine Before Perl" talk
> http://www.cs.brown.edu/~sk/Publications/Talks/SwineBeforePerl/ ).

May I also suggest the "derivatives" approach to regular expressions? It 
dates back to 1961, but is nicely described in a paper by Owens, Reppy, 
and Turon in the March 2009 issue of the Journal of Functional 
Programming (Owens used this approach in the parser-tools library that 
ships with PLT Scheme). The idea is simple: a regular expression is a 
tree. So write a function that consumes a regular expression E and a 
character c, and produces the regular expression E' describing all 
strings described by E that start with c. It's easy to program in Scheme 
(at least a basic version without lots of efficiency hacks).

> I also thought I could do a short unit on parsing, in order to
> introduce the idea of context-free grammars.

Recursive-descent parsing with just recognition is not bad, though you 
have to worry about left recursion. If you want to produce the parse 
tree or trees, or produce meaningful error messages, you need to develop 
some machinery. It may be too much in an introductory course. In our 
second course, I show them how to implement Scheme's `read' (for lists 
of symbols and natural numbers), and then point out how it can be 
generalized to parse XML and (well-formed) HTML. That might be another 
direction to pursue.

I'll send you some slides from a (too-jam-packed) talk on functional 
lexing and parsing I gave to the Computer Science Club here at UW. (The 
slides are available on their server, but it seems to be down at the 
moment.) --PR

Posted on the users mailing list.