[plt-scheme] Natural Language parsing in CS1

From: Chung-chieh Shan (ccshan at rutgers.edu)
Date: Tue Jun 2 10:29:41 EDT 2009

On 2009-06-02T05:32:52-0400, Eli Barzilay wrote:
> But the nature of working on a
> "symbolic program" (hand waving here) and on a "statistical program"
> (more waving) is very different.  It's essentially related to what I
> wrote above: in the former you get precise rules for what your code is
> supposed to do, and your program is "usually" either correct or not
> (and there are exceptions to this, of course).  In the latter, you
> measure success based on a statistical comparison with imprecise
> machines (aka humans) -- so the measure itself is statistical (and
> again, there are exceptions here too).  To clarify, what I'm saying is
> that it's not only the code itself that uses statistics, it's also the
> analysis of whether it's correct or not which is measured in
> percentages.  In yet other words -- if I'll add a function to PLT, and
> I'll document it as correct for 76% of all inputs, then it will look
> as ridiculous as someone writing a natural language parser that *is*
> correct period.

You seem to be drawing a distinction between two kinds of programs by
examining how they are specified and tested.  I certainly agree with
you that NLP has moved away from the "symbolic approach" to specifying
and testing programs that process text.  The typical ACL paper doesn't
revolve around a select few problematic examples anymore unless they are
representative of many test cases.  However, that is not the same as
saying, as you did earlier, that NLP has moved away from the "symbolic
approach" to parsing (or more generally, processing) texts.  Symbols are
alive and well inside modern NLP applications such as Babelfish, whether
they are represented inside containers such as probability distributions
(a functor and a monad, like all good containers are :).

So, it depends on what the goal of the exercise is.  If the goal is to
make the student appreciate how modern NLP applications are specified
and tested, then having them write CFG productions for a toy fragment of
English would not be the right way to go.  But if the goal is to make
the student "appreciate how Babelfish works", then there's nothing wrong
with a CFG exercise.

Shriram Krishnamurthi wrote:
> Otherwise you will have
> another generation of students thinking this is how NL is parsed, and
> will be in no position whatsoever to appreciate how something like
> Babelfish works.  (YC's message, where he effectively says he has read
> multiple AI books that mentioned NLP and yet they all left him
> unprepared to understand Eli's message, is telling.)

Production rules *are* how real NL is processed, as much as continuation
passing is how real Web applications work and closure conversion is how
real compilers work.  It is a different issue what position someone who
only learns about CFGs or continuation passing or closure conversion
is in to appreciate how Babelfish or Amazon or gcc works.  (I'm sure
you'll have no trouble finding an email message that describes how
programming languages are implemented nowadays and that someone who has
read multiple PL books is left unprepared to understand.  Even if the
message and the books are written in the same NL. :)

-- 
Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig
We want our revolution, and we want it now! -- Marat/Sade
We want our revolution, and we'll take it at such time as  
 you've gotten around to delivering it      -- Haskell programmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.racket-lang.org/users/archive/attachments/20090602/850c62a1/attachment.sig>

Posted on the users mailing list.