[racket] Parser-tools: `nesting' parsers?

From: Richard Lawrence (richard.lawrence at berkeley.edu)
Date: Sun May 6 20:52:30 EDT 2012

Hi everyone,

I have a question about the parser-tools library (which is great!).  I'm
hoping maybe someone has run into this use-case before and figured out a
solution.  What I'd like to be able to do is `nest' parsers --
i.e., have one parser's grammar include terminal symbols which are not
token-ids from an associated lexer, but rather the starting symbol of
some other parser's grammar.  For example, I'd like to be able to write
something like:

(define a-formula-parser
    (tokens sentence-letters connectives)
    (start formula)
    ; ...
      [formula ;...

(define proof-line-parser
    (tokens numbers rules)
    (sub-parser-terminals formula) ; this is what's not possible currently
    (start proof-line)
    ; ...
        [(line-number formula rule) ; this clause hands off to a-formula-parser
         (make-proof-line #:line $1 #:formula $2 #:rule $3)]]
      ; ...

; the formula-sub-parser parameter would be generated by the
; sub-parser-terminals clause above
(parameterize ([formula-sub-parser a-formula-parser])
  (proof-line-parser (lambda () (some-lexer some-input-stream))))      

The idea here is that a proof-line-parser, upon reaching a symbol in the
input stream that must be part of a formula, calls a formula-parser on
the input stream to get a value (here, $2), then continues on its merry
way.  Ideally, the associated sub-parser could be determined via a
parameter at run time, as I've indicated here.

As maybe you can guess from my example, the reason I want to do this is
that I am writing code to parse formal proofs in first-order logic.
These proofs generally have structure at two levels: the expressions of
the proof system, and the expressions of an associated first-order
language.  I want to be able to write separate parsers for the proofs
and the languages, and then combine them in this fashion, so that
multiple language parsers can be used with the same proof parser (and
vice versa).  If I have to write a separate parser for every
proof-system-and-language combination, there will be a *lot* of
redundant code.

I'm pretty new to the whole idea of shift-reduce parsing, so I'm not
sure if something like this is even possible (much less implementable
atop the parser-tools library).  But if anyone has any pointers here,
I'd be very glad to hear them!



Posted on the users mailing list.