[plt-scheme] Re: from hell to paradise
On Wed, Feb 18, 2009 at 11:12:00AM -0500, Prabhakar Ragde wrote:
> Hendrik Boom wrote:
>
> >>This week it's "Why the hell did I have to learn the theory of LR
> >>> parsing with explicit stacks and state transitions instead of
> >>> mutually-recursive functions?" --PR
> >
> >It's a completely different algorithm, with quite different
> >capabilities. I think, if you're going to write compilers, you should
> >know both, so you'll be able to use the one that suits the language.
>
> See, you're a victim of the conspiracy, too. They are not completely
> different algorithms. There is a thread of continuity between them
> explored in a series of papers, the central one being Leermakers,
> Augusteijn, and Kruseman Aretz (TCS 1992).
>
> Here it is in a nutshell. If we take a context-free grammar G and create
> a function A() for each nonterminal A whose job it is to consume (from
> an input list provided as a parameter to the function A) an initial
> sequence of terminals derivable from A, it is easy to write what we call
> a naive recursive-descent parser. It can even handle ambiguous grammars
> if we have it produce a list of input suffixes (that is, make it
> nondeterministic). But it takes exponential time on some grammars, and
> gets into infinite loops with left-recursive productions (e.g. A -> Ab).
>
> So we transform G to ensure there are at most two nonterminals on the
> right-hand side of every rule. There are many related ways to do this
> (cf Chomsky normal form) but we choose one that introduces items as
> nonterminals. An item is a grammar rule with a dot in the right-hand
> side to indicate partial progress. The rules in the transformed grammar
> have an item on the left side and on the right side, the nonterminal
> after the dot followed by the item with the dot moved over one position.
>
> If we apply recursive descent to the transformed grammar (with a
> function for each item), it still has the same problems. But if we
> memoize the computation, it's not hard to show that it takes O(n^3)
> time. [Aside: up to this point should be core material for any reputable
> CS curriculum, but it is not.] We can cut this down by introducing
> lookahead (first and follow sets), giving LL(k).
>
> But we still have the problem with left recursion, which tends to be
> natural in grammars for programming languages (and, I am told, for
> natural language grammars as well). There are techniques for avoiding
> the infinite loops, but the resulting parse trees "slant the wrong way".
> We can handle that by transforming the parse tree, but that involves
> operations on trees, not to mention building the parse tree in the first
> place. (Such things are hard for students instructed using Java.)
>
> So we transform the grammar again to eliminate left recursion. There are
> many related ways to do this (cf Greibach normal form), but we choose
> one that involves left corners of the symbol after the dot (a left
> corner being any initial symbol generated by rewriting the leftmost
> symbol zero or more times). Then we apply recursive descent to the newly
> transformed grammar.
>
> Expressed in terms of the original grammar, this involves not only a
> function for each item, but a function for each (left corner, item)
> pair. This is nondeterministic recursive ascent parsing, so named
> because the recursive functions "ascend" from a terminal which is a left
> corner through successive undone leftmost rewrites.
>
> To make this more general, we use the idea from NFA->DFA conversion, and
> replace functions on items by functions on sets of items (called
> states). Where before the natural successor of an item I with respect to
> a nonterminal A just after a dot was the item with the dot moved over
> the A, now the successor of a state is the collection of such plus the
> introduction of new items with A on the left and some right-hand side
> with a dot all the way to the left. This is the LR(0) automaton, and
> what I've described is its transition or "goto" function.
>
> If we apply recursive ascent to this, and work out the details, we get
> an LR(0) parser. (Pepper at TU-Berlin explains it entirely in terms of
> grammar transformations.) Shift-reduce and reduce-reduce conflicts are
> explained by being exactly the points in the function definitions where
> nondeterminism is possible. Adding lookahead as we did for LL(k) gives
> LR(k). Cutting down on the number of states and the complexity of the
> transition function by including various restrictions gives SLR or LALR.
>
> These sets of mutually-recursive functions don't look much like the
> traditional LR(0) or LR(k) parser. To get there, we make the recursion
> stack explicit, and treat the input list as a stack. Then the core of
> the algorithm becomes an iterative loop, popping and pushing. Finally,
> we notice that we can combine the two stacks into one, processing the
> input sequentially, and we have the conventional presentation (which,
> unlike everything I have described up to this point, cannot handle
> nondeterminism caused by grammars that don't conform to the restrictions).
>
> But all the intuition has been drained out of it, as the blood has been
> drained out of a slaughtered animal that ends up as shrink-wrapped
> pieces of meat in a supermarket. Yes, they're convenient, and we adults
> make the connection with living animals if we think about it enough,
> though usually we don't. The traditional LR(k) algorithm is efficient in
> time and space (to an extent not really needed in many applications any
> more, and certainly not needed in a pedagogical presentation), but
> students don't make the connection with more natural or simpler
> techniques. Instead, they either use tools (e.g. Lex/YACC) that they
> only dimly understand, flailing away at shift-reduce or reduce-reduce
> conflicts by randomly rewriting their grammars, or they learn the
> algorithm by rote but glean what intuition they can from one or two
> worked examples.
>
> I went further, learning the theories of LL and LR parsing from
> researchers in formal languages at two universities, but until recently
> I also thought these were "completely different" algorithms that could
> only be expressed iteratively using mutation. To be fair, I learned said
> theories before the above connections were worked out; the first papers
> in this series (Roberts and Kruseman Aretz) appeared in 1988. Twenty
> years later, we no longer have an excuse. --PR
> _________________________________________________
> For list-related administrative tasks:
> http://list.cs.brown.edu/mailman/listinfo/plt-scheme
I know about sone of this stuff. I made a presentation at IFIP Working
Group 2.1 in which I transformed a nonworking recursive-descent parser
into working one -- what Charles Lindsey remarked was a "recursive
descent bottom-up parser." Later, others went on from there,
formalizing the method, developing it further, and writing it up in
papers, including some of the papers you mentioned.
The key thing I introduced was having a production (i.e., a
recurive-descent paring function) deliver one of *several* nonterminals.
Instead of having to commit between parsing a Statement or a
Declaration, you call a function that parsers either, and reports what
it found. You then decide which it was after you've seen it.
The resulting parser still has the feel and flavour of a
recursive-descent parser, you can use all the coding methods
you've learned to use as a regular user of a powerful programming
language, interface with so-called "static semantics" in a natural way,
and retain intellectual control of the parser.
I suspect the technique would be easy to teach.
I've used this technique on all my recent parsers. When I write a
recursive-descent parser generator, I allow a production to specify what
constructor is to be used to build its piece of parse tree. Then where
the nonterminal is called, I use a notation that pattern-matches on
the bit of parse tree built to determine what to do next.
Perhaps I shouldn't have said you need to know both the LR and the
recursive descent methods. It is not true any more. You certainly did
need to know both in the 70's and 80's, when I did most of my compiler
writing (Algol 68 and C++ in case you're interested).
But I maintain that they are different algorithms. Performing
program transformations on an algorithm, especially ones as
nontrivial as you describe, really does transform it into a different
algorithm.
-- hendrik