[plt-scheme] Natural Language parsing in CS1

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Jun 2 05:32:52 EDT 2009

On Jun  2, David Brooks wrote:
> I think Todd's exercise is a decent introduction to parsing, and an
> understanding of this is vital to see why strict symbolic
> grammar-and- parser approaches do not scale. I've used similar
> exercises in the past, and I believe them to be helpful. There's no
> harm in showing a broken system, because students can discover for
> themselves where it breaks down.

Certainly presenting such an exercise as a way to make you understand
where this approach fails is valid.  But IMO that pretty requires
committing more time to the specific subject (of parsing NL in this
case).  In an NLP course students can get the point of such an
exercise -- but take a student in a PL course that is toying around
with that as a side-exercise, and you're likely to see students who
believe that this is actually a practical approach, and/or something
that is actually in use.  But (symbolic) parsing in itself is still
extremely useful -- just not to the kind of vagueness that NLs make,
so an exercise will of course be useful, you just need to be careful
about the conclusions...

(BTW, the book that you mention is kind of on the border of the
paradigm shift (chronologically, it's past that, but books & textbooks
usually lag) -- what I do know is that there's a problem with finding
good textbooks on the subject.)

> The phrase "reranking parser" will also get you somewhere useful. I
> saw Michael Collins give a very interesting talk on the subject, so
> his name might be a useful term.

Yes, Collins is very related to this shift -- his statistical parser
was a big success.  Working with it is an easy way to "get" why this
is so far from the symbolic corner of CS that we're usually buzzing
around here -- it starts by loading a huge piece of data (IIRC, it was
somewhere around 200-300MB) that contains the training results.  (In
fact, one thing I did with it was wrap it in some server code (using
PLT, of course) so that you don't pay the huge startup time for each
sentence.)  When dealing with such a parser, the difference from the
symbolic approach becomes painfully obvious: when there's a parsing
mistake, you don't have a specific rule in your code that you need to
debug -- an in fact, the whole debug-rerun cycle is very different
since you measure the relative correctness of your code by running
lots of experiments (in this case, parse many sentences) and
inspecting the overall results.  IOW, if you're writing a parser for a
programming language that fails on one program out of a thousand, then
you have a buggy parser -- but a parser for a natural language that
fails on one sentence out of a thousand would be a huge success.

> > [BTW, just to clarify -- I'm not an expert in NLP by any measure,
> > I just happen to live with one so I have extensive second-hand
> > knowledge, whatever that may be worth...]
> You've accurately summarised most of what I knew. I wish I had that  
> kind of second-hand knowledge...

Heh...  I still need to stress that I'm not at all an expert...  My
usual joke is that both me and my wife are doing CS, and we're both
doing languages -- only I'm doing formal ones, and she's doing natural
ones.  Many people would conclude that we're doing very close things,
but in practice it would be difficult to find two areas of CS that are
more different.

On Jun  2, Chung-chieh Shan wrote:
> [...]
> Thus, "the kind of actual parsing that is based on rules -- the kind
> that works nicely for parsing formal languages" -- is not at all
> limited to deterministic rules.  As Shriram wants (and so do I), it
> is linked by a "smooth continuum" to practical NLP.  It is not only
> compatible with symbolic processing but in fact presupposes symbolic
> processing.

Well, you can obviously speak about non-determinism formally...
That's the kind of stuff that you need to do when you analyze
something like a statistical parser.  But the nature of working on a
"symbolic program" (hand waving here) and on a "statistical program"
(more waving) is very different.  It's essentially related to what I
wrote above: in the former you get precise rules for what your code is
supposed to do, and your program is "usually" either correct or not
(and there are exceptions to this, of course).  In the latter, you
measure success based on a statistical comparison with imprecise
machines (aka humans) -- so the measure itself is statistical (and
again, there are exceptions here too).  To clarify, what I'm saying is
that it's not only the code itself that uses statistics, it's also the
analysis of whether it's correct or not which is measured in
percentages.  In yet other words -- if I'll add a function to PLT, and
I'll document it as correct for 76% of all inputs, then it will look
as ridiculous as someone writing a natural language parser that *is*
correct period.

But we're getting dangerously close to subjective opinions here, so
I'll try to avoid taking this further...

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                  http://www.barzilay.org/                 Maze is Life!

Posted on the users mailing list.