[plt-scheme] Re: Natural Language parsing in CS1

From: Damir Cavar (Damir.Cavar at gmail.com)
Date: Wed Jun 10 12:55:53 EDT 2009

Some simple implementation of a chart parser (Early-type of parser
with agenda etc.) can be found on the Schemers page at the University
of Zadar:

http://ling.unizd.hr/~schemers/

in the code section. This is just CFG-based, without Unification.

I have a first version of a graphical tree drawing tool that
visualizes s-structures (results of parses on the chart) as nice
graphical trees (as e.g. in XLE), using the PLT Scheme GUI toolkit
only. I'll post this in some days (once the coordinates for drawing
such trees get fixed so that they are nice looking and well-
balanced :-) ).

We extend the chart parser right now with a Unification formalism for
grammars of the LFG or HPSG type. So, if somebody wants to
participate, we can share the code-base via svn. This is definitely
not just a toy, one can seriously generate very good parse trees, and
even handle unknowns and partial parses easily for all kinds of text
types and constructions, even for Croatian. :-)

As far as parsing is concerned, and the question of scaling, and "such
symbolic approaches", don't get confused there. Such grammars and
approaches do scale, as XLE and the broad coverage LFG grammars show,
but also some HPSG grammars and parsers. Statistical methods can be
integrated easily, but purely statistical methods without real
grammars is not what I'd put my trust in (as a syntactician), at least
if we are talking about serious syntax parsing and not shallow
dependency trees (and I don't even see a real use for my syntax
approach in those kinds of models) etc.

It would be good to have some code and tools in Scheme for that. We're
trying to get things together, and I was actually already planing to
set up some (code) site for something like the Scheme NLTK (see Python
NLTK). I might fire this page up soon. Help is welcome!

Damir


Posted on the users mailing list.