[plt-scheme] Natural Language parsing in CS1
On 1 Jun 2009, at 20:22, Eli Barzilay wrote:
> On Jun 1, YC wrote:
>>
>> The AI books that I had read before took the parsing approach,
> Yes, this was a typical example in generic AI books.
... and still is in textbooks of computational linguistics. See, for
example, Christopher Manning and Hinrich Schütze's "Foundations of
Statistical Natural Language Processing". Chapter 3 of the above book
deals with symbolic parsing of the nature described by Todd.
I think Todd's exercise is a decent introduction to parsing, and an
understanding of this is vital to see why strict symbolic grammar-and-
parser approaches do not scale. I've used similar exercises in the
past, and I believe them to be helpful. There's no harm in showing a
broken system, because students can discover for themselves where it
breaks down.
>> so I am not familiar with the alternative - would you happen to have
>> links or googling phrases for finding the alternatives?
>
> Modern NL parsers are all statistical tools. They usually begin with
> a large corpus of human-parsed text, and learn how to parse more data.
Some still have foundations in context-free grammars, while others
have favoured different grammatical formalisms (dependency grammars
have become popular in the last few years; categorial grammars are
also widespread). Simpler models (in grammar terms) have also been
employed with some success. Search terms "shallow parsing" or "text
chunking" may get you somewhere.
However, Shriram's earlier point about CFGs not scaling well, while
correct, is not the major limiting factor here. It has long been known
that CFGs are not capable of expressing some of the features of
natural language (for English, sentences involving "respectively"),
and that some degree of context is required. Again, this doesn't mean
Todd's exercise is futile, since it can be used to show /why/ CFGs are
inadequate.
> As for a google phrase, a simple "natural language parsing" should
> work. You can also try google scholar if you're really interested
> (but in that case pay attention to the dates, since post 2000 papers
> will be very different than pre-1990).
The phrase "reranking parser" will also get you somewhere useful. I
saw Michael Collins give a very interesting talk on the subject, so
his name might be a useful term.
A more general overview can be found in chapter 12 of the
aforementioned book. There was a bit of an explosion in research in
the field as Eli said, so an overview may be a better starting point
than the huge numbers of papers published at ACL each year...
> [BTW, just to clarify -- I'm not an expert in NLP by any measure, I
> just happen to live with one so I have extensive second-hand
> knowledge, whatever that may be worth...]
You've accurately summarised most of what I knew. I wish I had that
kind of second-hand knowledge...
Cheers,
D
--
David Brooks
http://www.untyped.com