[plt-scheme] Natural Language parsing in CS1

From: Todd O'Bryan (toddobryan at gmail.com)
Date: Tue Jun 2 11:01:34 EDT 2009

On Tue, Jun 2, 2009 at 9:01 AM, David Brooks <djb at untyped.com> wrote:
>
> On 2 Jun 2009, at 13:51, Shriram Krishnamurthi wrote:
>>
>> ...if that is the point to the exercise.  Otherwise you will have
>> another generation of students thinking this is how NL is parsed, and
>> will be in no position whatsoever to appreciate how something like
>> Babelfish works.  (YC's message, where he effectively says he has read
>> multiple AI books that mentioned NLP and yet they all left him
>> unprepared to understand Eli's message, is telling.)

As Eli mentioned, statistical models for generation are not nearly as
successful as models for parsing.

One of my personal prejudices (from way too many classes in MIT-style
syntax, a few in Cognitive Grammar, and semantics of both the
truth-conditional and cognitivist sorts) is that I happen to believe
that humans are so good at parsing language because they're also good
at generating language. As we do language understanding, I think we're
constantly asking "What would I mean if I said what the person who's
talking to me is saying?"

"Garden path" sentences of the sort:

The horse raced past the barn fell. and
Spiro conjectures Ex-Lax.

are actually fairly easy for symbolic parsers to parse, but very hard
for human beings, largely (I think) because it's hard for humans to
create contexts in which they make sense. With the right context,
they're fairly easy, even for humans.

One of my computational linguistics professors said that the
statistical revolution of the 1990s was incredibly important, but he
worried that it was the result of competitive systems that encouraged
people to create something that worked, not necessarily something that
was based in good research. His view was that people had kind of hit a
wall and the field needed to go back to doing some basic research to
figure out how to get past the limitations people had hit.

As for my project, I think I'm going to implement a GLR/Tomita parser,
since it works on ambiguous grammars, is guaranteed O(n^3)
performance, and will give me a chance to program some really
interesting data structures and get more comfortable with idiomatic
Scheme.

Todd


Posted on the users mailing list.