[plt-scheme] Re: Natural Language parsing in CS1

From: Chung-chieh Shan (ccshan at rutgers.edu)
Date: Tue Jun 2 05:07:16 EDT 2009

On 2009-06-01T20:03:41-0400, Eli Barzilay wrote:
> On Jun  1, Chung-chieh Shan wrote:
> > I'm not sure what you mean by "this kind of symbolic approach".  The
> > Penn Treebank, for example, is full of symbols like NP and VP and
> > their hierarchical composition.
> 
> I'm talking about the kind of actual parsing that is based on
> (deterministic, usually) rules -- the kind that works nicely for
> parsing formal languages.

I see.  I would definitely second the advice (to the original message on
this thread) that the parser take into account the nondeterminism (i.e.,
ambiguity) of natural language.  For example, the parser should allow
and produce multiple parses for the same string.  A parsing algorithm
that does not require finite look-ahead, as simple as Earley or even
CYK, would do nicely.

But nondeterminism is the bedrock of both modern NLP and classical
formal-language parsing.  The role of nondeterminism in formal languages
has been established since Rabin and Scott's Turing-award work (1959)
and canonized in most introductory undergraduate courses on formal
languages and theory of computation.  Weighting is a useful kind of
nondeterminism; a production in a probabilistic context-free grammar
is a useful kind of a rule; and learning a grammar from data using
Bayesian statistics is a useful way to generate code -- some would say
to compile.  Also, the familiar slogan "programs as data" is a useful
way to understand hierarchical Bayesian models and some recent work on
"adaptor grammars" coming from Brown.

Thus, "the kind of actual parsing that is based on rules -- the kind
that works nicely for parsing formal languages" -- is not at all limited
to deterministic rules.  As Shriram wants (and so do I), it is linked by
a "smooth continuum" to practical NLP.  It is not only compatible with
symbolic processing but in fact presupposes symbolic processing.

-- 
Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig
We want our revolution, and we want it now! -- Marat/Sade
We want our revolution, and we'll take it at such time as  
 you've gotten around to delivering it      -- Haskell programmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.racket-lang.org/users/archive/attachments/20090602/a5a7118b/attachment.sig>

Posted on the users mailing list.