[racket] syntax-parse, macros, and literal-sets

From: Ryan Culpepper (ryanc at ccs.neu.edu)
Date: Thu May 30 06:49:43 EDT 2013

On 05/29/2013 03:30 AM, Eric Dobson wrote:
> I was writing a macro that generated a literal-set and ran into some
> confusing behavior which I have distilled to the following program.
>
> #lang racket
> (require syntax/parse
>           (for-syntax syntax/parse))
>
>
> (define-syntax (define-ls1 stx)
>    (syntax-parse stx
>      ((_ name:id (lit:id ...))
>       #'(define-literal-set name (lit ...)))))
>
> (define-ls1 ls1 (+))
> (define-syntax-class sc1
>    #:literal-sets (ls1)
>    (pattern +))
>
> (for/list ((x (list #'+ #'*)))
>    (syntax-parse x
>      (x:sc1 #t)
>      (_ #f)))
>
>
> (define-syntax (define-sc2 stx)
>    (syntax-parse stx
>      ((_ name:id (lit:id ...))
>       #'(begin
>           (define-literal-set inner (lit ...))
>           (define-syntax-class name
>             #:literal-sets (inner)
>             (pattern lit) ...)))))
>
> (define-sc2 sc2 (+))
> (for/list ((x (list #'+ #'*)))
>    (syntax-parse x
>      (x:sc2 #t)
>      (_ #f)))
>
> (define-syntax (define-sc3 stx)
>    (syntax-parse stx
>      ((_ name:id inner:id (lit:id ...))
>       #'(begin
>           (define-literal-set inner (lit ...))
>           (define-syntax-class name
>             #:literal-sets (inner)
>             (pattern lit) ...)))))
>
> (define-sc3 sc3 inner3 (+))
> (for/list ((x (list #'+ #'*)))
>    (syntax-parse x
>      (x:sc3 #t)
>      (_ #f)))
>
>
> (define-syntax (define-sc4 stx)
>    (syntax-parse stx
>      ((_ name:id (lit:id ...))
>       #'(begin
>           (define-literal-set inner (lit ...))
>           (define-syntax-class name
>             #:literal-sets ((inner #:at name))
>             (pattern lit) ...)))))
>
> (define-sc4 sc4 (+))
> (for/list ((x (list #'+ #'*)))
>    (syntax-parse x
>      (x:sc4 #t)
>      (_ #f)))
>
> This produces the output:
> '(#t #f)
> '(#t #t)
> '(#t #f)
> '(#t #f)
>
> I would have expected the second one to return '(#t #f) like the first
> but it doesn't.

The issue is how syntax-parse decides whether an identifier in a pattern 
is a pattern variable or a literal. Let's take the simple case, where we 
have just a literals list. From the standpoint of hygiene, the literals 
list is "binding-like" because it sets the interpretation for 
"references" in the patterns. That means the relevant comparison is 
bound-identifier=? (like all binding forms), and that means that at a 
minimum if you want an identifier in a pattern to be considered a 
literal, it must have the same marks as the corresponding identifier in 
the literals list. Consider the following program:

(define-syntax-rule
   (define-syntax-rule/arith (macro . pattern) template)
   (define-syntax macro
     (syntax-rules (+ - * /)
       [(macro . pattern) template])))

(define-syntax-rule/arith (sum-left-arg (+ x y)) x)

One might expect sum-left-arg to raise a syntax error if given a term 
with something other than + in operator position. But in fact:

(sum-left-arg (* 1 2))
;; => 1

The reason is because the expansion of define-syntax-rule/arith puts 
marks on the + in the literals list. So only +'s with the same mark in 
the pattern are considered literals; all others are pattern variables. 
In particular, the + in the pattern in the definition of sum-left-arg is 
unmarked, so it's a pattern variable.

Now back to literal sets. A #:literal-sets clause is an essentially like 
an unhygienic binding form. The identifiers it "binds" must be given 
some lexical context; it defaults to the lexical context of the literal 
set name. In define-sc2 from your example, that is inner, which is 
introduced by the macro and thus has a mark on it. So it's just like you 
had a literals list consisting of a marked +. But the + in the pattern 
is unmarked, since it comes from the original program (via the lit 
pattern variable). They don't match, so the identifier in the pattern is 
interpreted as a pattern variable.

 > The third and fourth are ways to get it to, but I'm
> not sure why they work. The third seems to show that the meaning of
> how a literal set binds variables depends on the context of the
> literal-set name and not the actual literals.

In define-sc3, inner also comes from the original program, so it's 
unmarked, so the literals consist of an unmarked +, which matches the + 
in the pattern.

 > The fourth way uses #:at
> which is documented as being useful to macros, but not much else. Can
> someone explain how literal-sets are supposed to determine whether or
> not a variable in the pattern is treated as a literal or as a pattern
> variable?

An #:at clause changes the lexical context given to literals. In 
define-sc4, inner is back to being marked, but that doesn't matter, 
because the literals are given the lexical context of the syntax bound 
to name. In the example, that's sc4, which comes from the macro use and 
is thus unmarked.

All unhygienic binding forms have similar difficulties. Macros that 
expand into require forms are another common source of confusion.

Ryan


Posted on the users mailing list.