[racket] Introduction to Parser Tools library

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Tue Jan 21 23:29:51 EST 2014

Harry Spier wrote at 01/21/2014 10:17 PM:
> The documentation to the Parser Tools library  assumes familiarity 
> with lex and yacc style lexer and parser generators.
>
> Can someone recommend a good on-line tutorial to lex and yacc. 
>  Something I could go through but using Racket Parser Tools instead of 
> lex and yacc.  Or alternatively an inexpensive book.

I bet there's a satisfactory free tutorial, but I have to defer to other 
people on that.

FWIW, what I used (before the Web) was probably a combination of the 
parsing concepts from the Red Dragon Book 
("http://www.amazon.com/Compilers-Principles-Techniques-Alfred-Aho/dp/0201100886") 
and practical info in the 1st edition ORA book 
("http://shop.oreilly.com/product/9781565920002.do").  The Dragon Book 
used to be *the* textbook for compilers, so lots of libraries (and older 
CS colleagues) should have a copy to lend.  There should also be a good 
number copies of the ORA book floating around.

You don't need to know much of the theory to write a parser for a 
nontrivial language using lex&yacc-ish tools.  (Although working through 
textbook chapters like NFA->DFA and LL/LR/LALR/etc. algorithms will make 
you better at using lex&yacc well, you usually don't *need* to know any 
of that to make working parsers.)  What you might need to know:
(1) that lex&yacc strongly distinguish a tokenization/lexing level from 
a grammar-based parsing level;
(2) simple grep-like regexps, for simple tasks, nothing fancy;
(3) the parsing production grammars and how to use them in general, 
including the code patterns for optional and repeating syntax, and how 
to hook up AST-building (which will be different in Racket than in C) or 
other actions to the productions;
(4) depending on your language, you might need to know the class of the 
parser and limitations like whether you're limited on symbols of 
lookahead, what is a reduce-reduce conflict, etc.;
(5) be aware that you might run into some complication like C's need for 
symbol table lookup during parsing, and generally avoid this kind of 
complication unless you know what you're doing; and
(6) perhaps know how to handle syntax errors gracefully in a particular 
parsing toolkit.

This might sound like a lot, but it's a small subset of what's in the 
books, and it doesn't take long to learn, if you just do a little 
reading and a few hours of practical exercises/using.

The Racket "parser-tools" are very much inspired by C lex&yacc, and 
conceptually very similar, but the Racket ones are much easier to use in 
the practical details. I wouldn't spend time trying to use original C 
lex&yacc as a learning exercise at this point, but just jump directly to 
using the Racket tools.

Also, if you're parsing programming languages for which you would like 
to report errors well and be a "#lang" in DrRacket, I strongly recommend 
learning Racket syntax objects soon.  Don't let learning syntax objects 
slow your learning of parser-tools, though -- I might do the initial 
learning of AST-building using simpler Racket types, like lists and 
structs, since using syntax objects could obscure some basic ideas, and 
introduce more bug hurdles (unless you had a tutorial that gave you code 
patterns of how to use syntax objects in all the ways that you needed).

That's about learning how to use this particular toolkit well.  If, 
however, I wanted to learn compilers in general, today I would start 
with (our own) Shriram Krishnamurthi's PLAI book 
("http://cs.brown.edu/~sk/Publications/Books/ProgLangs/").

Neil V.


Posted on the users mailing list.