[racket] syntax-parse, macros, and literal-sets

From: Eric Dobson (eric.n.dobson at gmail.com)
Date: Thu May 30 12:25:36 EDT 2013

Why do literal-sets have to be unhygienic? I expected them to work
like literal lists which you say are hygienic. Also literal-sets are
not documented as being unhygienic, which was the confusing part.

In my example 2 both the literals in the literal-list and the ids used
to match the syntax have the same marks, so it is very unintuitive
that the lexical context of the literal-set name should matter. Also
can literal sets be used to abstract over identifiers with different
lexical contexts, intuitively since other forms can do that I would
expect that they should be able to, but given that its unhygienic I'm
not so sure now.

On Thu, May 30, 2013 at 3:49 AM, Ryan Culpepper <ryanc at ccs.neu.edu> wrote:
> On 05/29/2013 03:30 AM, Eric Dobson wrote:
>>
>> I was writing a macro that generated a literal-set and ran into some
>> confusing behavior which I have distilled to the following program.
>>
>> #lang racket
>> (require syntax/parse
>>           (for-syntax syntax/parse))
>>
>>
>> (define-syntax (define-ls1 stx)
>>    (syntax-parse stx
>>      ((_ name:id (lit:id ...))
>>       #'(define-literal-set name (lit ...)))))
>>
>> (define-ls1 ls1 (+))
>> (define-syntax-class sc1
>>    #:literal-sets (ls1)
>>    (pattern +))
>>
>> (for/list ((x (list #'+ #'*)))
>>    (syntax-parse x
>>      (x:sc1 #t)
>>      (_ #f)))
>>
>>
>> (define-syntax (define-sc2 stx)
>>    (syntax-parse stx
>>      ((_ name:id (lit:id ...))
>>       #'(begin
>>           (define-literal-set inner (lit ...))
>>           (define-syntax-class name
>>             #:literal-sets (inner)
>>             (pattern lit) ...)))))
>>
>> (define-sc2 sc2 (+))
>> (for/list ((x (list #'+ #'*)))
>>    (syntax-parse x
>>      (x:sc2 #t)
>>      (_ #f)))
>>
>> (define-syntax (define-sc3 stx)
>>    (syntax-parse stx
>>      ((_ name:id inner:id (lit:id ...))
>>       #'(begin
>>           (define-literal-set inner (lit ...))
>>           (define-syntax-class name
>>             #:literal-sets (inner)
>>             (pattern lit) ...)))))
>>
>> (define-sc3 sc3 inner3 (+))
>> (for/list ((x (list #'+ #'*)))
>>    (syntax-parse x
>>      (x:sc3 #t)
>>      (_ #f)))
>>
>>
>> (define-syntax (define-sc4 stx)
>>    (syntax-parse stx
>>      ((_ name:id (lit:id ...))
>>       #'(begin
>>           (define-literal-set inner (lit ...))
>>           (define-syntax-class name
>>             #:literal-sets ((inner #:at name))
>>             (pattern lit) ...)))))
>>
>> (define-sc4 sc4 (+))
>> (for/list ((x (list #'+ #'*)))
>>    (syntax-parse x
>>      (x:sc4 #t)
>>      (_ #f)))
>>
>> This produces the output:
>> '(#t #f)
>> '(#t #t)
>> '(#t #f)
>> '(#t #f)
>>
>> I would have expected the second one to return '(#t #f) like the first
>> but it doesn't.
>
>
> The issue is how syntax-parse decides whether an identifier in a pattern is
> a pattern variable or a literal. Let's take the simple case, where we have
> just a literals list. From the standpoint of hygiene, the literals list is
> "binding-like" because it sets the interpretation for "references" in the
> patterns. That means the relevant comparison is bound-identifier=? (like all
> binding forms), and that means that at a minimum if you want an identifier
> in a pattern to be considered a literal, it must have the same marks as the
> corresponding identifier in the literals list. Consider the following
> program:
>
> (define-syntax-rule
>   (define-syntax-rule/arith (macro . pattern) template)
>   (define-syntax macro
>     (syntax-rules (+ - * /)
>       [(macro . pattern) template])))
>
> (define-syntax-rule/arith (sum-left-arg (+ x y)) x)
>
> One might expect sum-left-arg to raise a syntax error if given a term with
> something other than + in operator position. But in fact:
>
> (sum-left-arg (* 1 2))
> ;; => 1
>
> The reason is because the expansion of define-syntax-rule/arith puts marks
> on the + in the literals list. So only +'s with the same mark in the pattern
> are considered literals; all others are pattern variables. In particular,
> the + in the pattern in the definition of sum-left-arg is unmarked, so it's
> a pattern variable.
>
> Now back to literal sets. A #:literal-sets clause is an essentially like an
> unhygienic binding form. The identifiers it "binds" must be given some
> lexical context; it defaults to the lexical context of the literal set name.
> In define-sc2 from your example, that is inner, which is introduced by the
> macro and thus has a mark on it. So it's just like you had a literals list
> consisting of a marked +. But the + in the pattern is unmarked, since it
> comes from the original program (via the lit pattern variable). They don't
> match, so the identifier in the pattern is interpreted as a pattern
> variable.
>
>
>> The third and fourth are ways to get it to, but I'm
>>
>> not sure why they work. The third seems to show that the meaning of
>> how a literal set binds variables depends on the context of the
>> literal-set name and not the actual literals.
>
>
> In define-sc3, inner also comes from the original program, so it's unmarked,
> so the literals consist of an unmarked +, which matches the + in the
> pattern.
>
>
>> The fourth way uses #:at
>>
>> which is documented as being useful to macros, but not much else. Can
>> someone explain how literal-sets are supposed to determine whether or
>> not a variable in the pattern is treated as a literal or as a pattern
>> variable?
>
>
> An #:at clause changes the lexical context given to literals. In define-sc4,
> inner is back to being marked, but that doesn't matter, because the literals
> are given the lexical context of the syntax bound to name. In the example,
> that's sc4, which comes from the macro use and is thus unmarked.
>
> All unhygienic binding forms have similar difficulties. Macros that expand
> into require forms are another common source of confusion.
>
> Ryan
>

Posted on the users mailing list.