[plt-scheme] Natural Language parsing in CS1

From: hendrik at topoi.pooq.com (hendrik at topoi.pooq.com)
Date: Mon Jun 1 11:27:32 EDT 2009

On Mon, Jun 01, 2009 at 11:04:46AM -0400, Prabhakar Ragde wrote:
> Todd wrote:
> >I think I'm going to do a short unit on finite state
> >automata and regular expressions (and I'll use something like the
> >macros Shriram presents in his "The Swine Before Perl" talk
> >http://www.cs.brown.edu/~sk/Publications/Talks/SwineBeforePerl/ ).
> May I also suggest the "derivatives" approach to regular expressions? It 
> dates back to 1961, but is nicely described in a paper by Owens, Reppy, 
> and Turon in the March 2009 issue of the Journal of Functional 
> Programming (Owens used this approach in the parser-tools library that 
> ships with PLT Scheme). The idea is simple: a regular expression is a 
> tree. So write a function that consumes a regular expression E and a 
> character c, and produces the regular expression E' describing all 
> strings described by E that start with c. It's easy to program in Scheme 
> (at least a basic version without lots of efficiency hacks).
> >I also thought I could do a short unit on parsing, in order to
> >introduce the idea of context-free grammars.
> Recursive-descent parsing with just recognition is not bad, though you 
> have to worry about left recursion. If you want to produce the parse 
> tree or trees, or produce meaningful error messages, you need to develop 
> some machinery. It may be too much in an introductory course. In our 
> second course, I show them how to implement Scheme's `read' (for lists 
> of symbols and natural numbers), and then point out how it can be 
> generalized to parse XML and (well-formed) HTML. That might be another 
> direction to pursue.
> I'll send you some slides from a (too-jam-packed) talk on functional 
> lexing and parsing I gave to the Computer Science Club here at UW. (The 
> slides are available on their server, but it seems to be down at the 
> moment.) --PR

In a first course on compiling, the first assignment was to take a very 
simple language (explicitly specified as part of the assignment) and to 
translate it to object code (also explicitly specified -- I specified 
exactly what code every source-language construct was to be translated 

I provided them with a sample program to translate.

The trick to the assignment was that the language was a notation for 
grammars with explicit output commands to be performed during the parse, 
its semantics was a straight recursive-descent parse, and the sample 
program was itself the solution to the assignment (except that it was 
written in a language not available on their computer (until they 
finished the assignment, of course))

I gave them two weeks to do it.

It worked.  Afterwards they knew a fair amount of the pragmatics of 
recursive-descent parsing.

-- hendrik

Posted on the users mailing list.