[plt-scheme] Natural Language parsing in CS1

From: David Brooks (davidjamesbrooks at gmail.com)
Date: Tue Jun 2 04:36:25 EDT 2009

On 1 Jun 2009, at 20:22, Eli Barzilay wrote:
> On Jun  1, YC wrote:
>> The AI books that I had read before took the parsing approach,
> Yes, this was a typical example in generic AI books.

... and still is in textbooks of computational linguistics. See, for  
example, Christopher Manning and Hinrich Schütze's "Foundations of  
Statistical Natural Language Processing". Chapter 3 of the above book  
deals with symbolic parsing of the nature described by Todd.

I think Todd's exercise is a decent introduction to parsing, and an  
understanding of this is vital to see why strict symbolic grammar-and- 
parser approaches do not scale. I've used similar exercises in the  
past, and I believe them to be helpful. There's no harm in showing a  
broken system, because students can discover for themselves where it  
breaks down.

>> so I am not familiar with the alternative - would you happen to have
>> links or googling phrases for finding the alternatives?
> Modern NL parsers are all statistical tools.  They usually begin with
> a large corpus of human-parsed text, and learn how to parse more data.

Some still have foundations in context-free grammars, while others  
have favoured different grammatical formalisms (dependency grammars  
have become popular in the last few years; categorial grammars are  
also widespread). Simpler models (in grammar terms) have also been  
employed with some success. Search terms "shallow parsing" or "text  
chunking" may get you somewhere.

However, Shriram's earlier point about CFGs not scaling well, while  
correct, is not the major limiting factor here. It has long been known  
that CFGs are not capable of expressing some of the features of  
natural language (for English, sentences involving "respectively"),  
and that some degree of context is required. Again, this doesn't mean  
Todd's exercise is futile, since it can be used to show /why/ CFGs are  

> As for a google phrase, a simple "natural language parsing" should
> work.  You can also try google scholar if you're really interested
> (but in that case pay attention to the dates, since post 2000 papers
> will be very different than pre-1990).

The phrase "reranking parser" will also get you somewhere useful. I  
saw Michael Collins give a very interesting talk on the subject, so  
his name might be a useful term.

A more general overview can be found in chapter 12 of the  
aforementioned book. There was a bit of an explosion in research in  
the field as Eli said, so an overview may be a better starting point  
than the huge numbers of papers published at ACL each year...

> [BTW, just to clarify -- I'm not an expert in NLP by any measure, I
> just happen to live with one so I have extensive second-hand
> knowledge, whatever that may be worth...]

You've accurately summarised most of what I knew. I wish I had that  
kind of second-hand knowledge...

David Brooks

Posted on the users mailing list.