[plt-scheme] Natural Language parsing in CS1
On Mon, Jun 01, 2009 at 11:04:46AM -0400, Prabhakar Ragde wrote:
> Todd wrote:
>
> >I think I'm going to do a short unit on finite state
> >automata and regular expressions (and I'll use something like the
> >macros Shriram presents in his "The Swine Before Perl" talk
> >http://www.cs.brown.edu/~sk/Publications/Talks/SwineBeforePerl/ ).
>
> May I also suggest the "derivatives" approach to regular expressions? It
> dates back to 1961, but is nicely described in a paper by Owens, Reppy,
> and Turon in the March 2009 issue of the Journal of Functional
> Programming (Owens used this approach in the parser-tools library that
> ships with PLT Scheme). The idea is simple: a regular expression is a
> tree. So write a function that consumes a regular expression E and a
> character c, and produces the regular expression E' describing all
> strings described by E that start with c. It's easy to program in Scheme
> (at least a basic version without lots of efficiency hacks).
>
> >I also thought I could do a short unit on parsing, in order to
> >introduce the idea of context-free grammars.
>
> Recursive-descent parsing with just recognition is not bad, though you
> have to worry about left recursion. If you want to produce the parse
> tree or trees, or produce meaningful error messages, you need to develop
> some machinery. It may be too much in an introductory course. In our
> second course, I show them how to implement Scheme's `read' (for lists
> of symbols and natural numbers), and then point out how it can be
> generalized to parse XML and (well-formed) HTML. That might be another
> direction to pursue.
>
> I'll send you some slides from a (too-jam-packed) talk on functional
> lexing and parsing I gave to the Computer Science Club here at UW. (The
> slides are available on their server, but it seems to be down at the
> moment.) --PR
In a first course on compiling, the first assignment was to take a very
simple language (explicitly specified as part of the assignment) and to
translate it to object code (also explicitly specified -- I specified
exactly what code every source-language construct was to be translated
to).
I provided them with a sample program to translate.
The trick to the assignment was that the language was a notation for
grammars with explicit output commands to be performed during the parse,
its semantics was a straight recursive-descent parse, and the sample
program was itself the solution to the assignment (except that it was
written in a language not available on their computer (until they
finished the assignment, of course))
I gave them two weeks to do it.
It worked. Afterwards they knew a fair amount of the pragmatics of
recursive-descent parsing.
-- hendrik