[racket] Survey of parsing libraries for Racket?

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Sat Mar 10 16:23:58 EST 2012

Danny Yoo wrote at 03/10/2012 03:06 PM:
> For people who have used these libraries, how was the experience?
> Basically, I'm trying to find something powerful and stable to work
> with.

I used "parser-tools" successfully the other day, to implement a parser 
for a subset of PDF.  It seemed great as a Lex&Yacc replacement (you 
can't beat doing a parser toolkit as syntax extension), but not quite 
the be-all and end-all.  I haven't looked at the other Racket-based 
parser tools.

Details follow...

I have used a bunch of different parser tools with other languages in 
the past, and I'd say that "parser-tools" is a Lex&Yacc in Racket, with 
a few frills added for tokens, and it did give me all the hooks I needed 
to help things along with arbitrary code.

The parser grammar wasn't quite as readable as it would be if one could 
put keyword lexemes literally in the grammar (some other parser toolkits 
map literals to tokens directly, and you may or may not define the 
keywords separately).  This is something you could layer atop with your 
own pretty simple syntax extension, of course.

As I mentioned the other day, it didn't have some EBNF shorthand, which 
I missed when I was writing the grammar, but found would have gotten in 
the way when I built my AST.  Again, EBNF is something you could layer 
atop with your own macro, especially if you have your macro implement 
your own particular way of AST-building.

Some toolkits don't make the distinction that Lex and Yacc does, and you 
instead use the same metalanguage to build up from characters to full 
grammars.  If you want to do that, I suppose you might be able to layer 
that atop "parser-tools" reasonably.

Language class is of course a consideration in whatever parser tool you 
use.  I don't recall for certain what class I needed, but it might have 
been only LL(1).  At a glance, it looked doable in Yacc without any 
conflicts, so I didn't have to look further.  For the lexer, I needed to 
scan literals that involved balancing arbitrarily-nested parens, which 
is not a job for regexps, but "parser-tools" gave me an easy hook to 
code that part of the lexer manually.

"parser-tools" seems to have a lot of support for syntax position, which 
I did not use for this project, but would for most projects.

I didn't look at whether "parser-tools" has error reporting/recovery 
features, but that's another thing that I've had some toolkits help with 
when parsing really nasty languages.

I also did not measure performance of "parser-tools", but it didn't seem 
bad for what I was doing.

Holistically, the combination of "parser-tools" with Racket makes it the 
best overall parsing toolkit I've used for a project, even though 
"parser-tools" didn't have all the conveniences I've found in some 
toolkits that pair with much less nice languages.

Incidentally, I'm not sure of the performance implications, but I like 
the idea of having the parser for a programming language translate the 
syntax objects to sexp-like syntax objects promptly, and then 
"syntax-parse" the heck out of that newly sexp-encoded language to turn 
it into Racket code.  I'm also doing a related, syntax-object-heavy 
thing in the McFly embedded documentation tool, in which McFly fills out 
things like Scribble "defproc" signatures by parsing bits of information 
from "lambda" argument forms, contracts, explicitly-provided pieces of 
"defproc", (later I'll add Typed Racket, too), etc., translating all 
that info to a normalized form, and then using a simple unification of 
the various info before running the unified normalized form through 
another syntax transformer to output a Scribble "defproc".  Surely not 
the fastest way to do it, but I suspect it's in the noise when we 
consider how much crunching Scribble already does.

Neil V.


Posted on the users mailing list.