[racket] Announcement for ragg: a Racket AST Generator Generator

From: Danny Yoo (dyoo at hashcollision.org)
Date: Fri Jan 18 15:42:44 EST 2013

[CCing Racket mailing list for feedback]


> A couple questions about Ragg:
> 1. Why not just use regular expressions to define tokens?

ragg will probably do this in the future.  I haven't nailed down the
right design yet, but I think it'll involve ragg automatically
producing a "tokenize" binding as well which knows to recognize the
definitions for the uppercased tokens.



> 2. Have you considered adding "reductions" to the grammar, i.e., something
> analogous to red in the gll link you sent me?  Are there any good precedents
> for incorporating reductions into the sort of standard CFG syntax that Ragg
> knows how to process?  I think it would  be useful, but I'm having trouble
> visualizing what a clean syntax for that would look like.

I've been thinking about this.  Let me sketch out the idea in public
to make sure it isn't completely crazy. :)

My current plan is to adopt what Yacc's syntax is for semantic
actions, but co-opt it to do something functional.  In this case, it's
not functional because that's what the Cool Kids do, but because don't
think I have a choice: I have to deal with ambiguous grammars and the
way the parser can backtrack.


This ties with a feature that Jens Axel Soegaard requested, to provide
a mechanism for sending values from the parser back to the lexer, to
do things such as the "lexer hack".  I think this can work if we
associate an accumulator that follows through the parse.  The idea is
to embrace the spirit of foldl: rather than folding across a list, we
fold across the parse.

This means we should extend "token sources" and allow them to also be
functions one argument whose value will be the accumulator at that
point in the parsed stream.  On the parser side, we'll provide a
semantic "action" that allows accumulator update.


So, donning my wishing-makes-it-so hat, maybe something like the following:

---
    #lang ragg

    thing: "(" thing ")"            { #:result (second (syntax->list $stx)) }
           | ID                         { #:update-acc (add1 $acc) }

    ID: #px"\\w+"                  { #:result (list $acc $stx) }
---

The fuzzy example here is meant to mean: a parser that associates each
ID with the current count of identifiers it's seen.  The variables
$stx and $acc will stand for the value of the auto-generated syntax
object and the current state of the accumulator.  #:result would allow
overriding the return value of a rule, and #:update-acc to update the
accumulator.


If I allowed general semantic actions, it'd be too tempting to use
side effects, and side effects in an environment with backtracking
will probably cause confusion.  But if we pass along this accumulator
during the parse, I think we should be ok.


This is my current sketchy, vague idea.  I haven't implemented
anything yet, but just wanted to see if that made any kind of sense.
:)

Posted on the users mailing list.