[racket] Survey of parsing libraries for Racket?
Danny Yoo wrote at 03/10/2012 03:06 PM:
> For people who have used these libraries, how was the experience?
> Basically, I'm trying to find something powerful and stable to work
> with.
I used "parser-tools" successfully the other day, to implement a parser
for a subset of PDF. It seemed great as a Lex&Yacc replacement (you
can't beat doing a parser toolkit as syntax extension), but not quite
the be-all and end-all. I haven't looked at the other Racket-based
parser tools.
Details follow...
I have used a bunch of different parser tools with other languages in
the past, and I'd say that "parser-tools" is a Lex&Yacc in Racket, with
a few frills added for tokens, and it did give me all the hooks I needed
to help things along with arbitrary code.
The parser grammar wasn't quite as readable as it would be if one could
put keyword lexemes literally in the grammar (some other parser toolkits
map literals to tokens directly, and you may or may not define the
keywords separately). This is something you could layer atop with your
own pretty simple syntax extension, of course.
As I mentioned the other day, it didn't have some EBNF shorthand, which
I missed when I was writing the grammar, but found would have gotten in
the way when I built my AST. Again, EBNF is something you could layer
atop with your own macro, especially if you have your macro implement
your own particular way of AST-building.
Some toolkits don't make the distinction that Lex and Yacc does, and you
instead use the same metalanguage to build up from characters to full
grammars. If you want to do that, I suppose you might be able to layer
that atop "parser-tools" reasonably.
Language class is of course a consideration in whatever parser tool you
use. I don't recall for certain what class I needed, but it might have
been only LL(1). At a glance, it looked doable in Yacc without any
conflicts, so I didn't have to look further. For the lexer, I needed to
scan literals that involved balancing arbitrarily-nested parens, which
is not a job for regexps, but "parser-tools" gave me an easy hook to
code that part of the lexer manually.
"parser-tools" seems to have a lot of support for syntax position, which
I did not use for this project, but would for most projects.
I didn't look at whether "parser-tools" has error reporting/recovery
features, but that's another thing that I've had some toolkits help with
when parsing really nasty languages.
I also did not measure performance of "parser-tools", but it didn't seem
bad for what I was doing.
Holistically, the combination of "parser-tools" with Racket makes it the
best overall parsing toolkit I've used for a project, even though
"parser-tools" didn't have all the conveniences I've found in some
toolkits that pair with much less nice languages.
Incidentally, I'm not sure of the performance implications, but I like
the idea of having the parser for a programming language translate the
syntax objects to sexp-like syntax objects promptly, and then
"syntax-parse" the heck out of that newly sexp-encoded language to turn
it into Racket code. I'm also doing a related, syntax-object-heavy
thing in the McFly embedded documentation tool, in which McFly fills out
things like Scribble "defproc" signatures by parsing bits of information
from "lambda" argument forms, contracts, explicitly-provided pieces of
"defproc", (later I'll add Typed Racket, too), etc., translating all
that info to a normalized form, and then using a simple unification of
the various info before running the unified normalized form through
another syntax transformer to output a Scribble "defproc". Surely not
the fastest way to do it, but I suspect it's in the noise when we
consider how much crunching Scribble already does.
Neil V.
--
http://www.neilvandyke.org/