[racket] Basic Questions Regarding Macros

From: Marco Maggi (marco.maggi-ipsu at poste.it)
Date: Thu Sep 1 05:23:12 EDT 2011

Todd Bittner wrote:
> In  regards  to   syntax-rules  and  syntax-id-rules  there  is
> 'literal-id' parameter that I don't understand.

There was a thread on this subject recently:

<http://www.mail-archive.com/users@racket-lang.org/msg07145.html>

in which I tried to give an explanation with examples:

<http://www.mail-archive.com/users@racket-lang.org/msg07147.html>

at  then I  turned it  into an  annotation to  the  R6RS standard
(scroll  down  to  the  section  entitled  "How  to  use  literal
arguments"):

<http://marcomaggi.github.com/docs/nausicaa.html/baselib-transformers.html>

> Finally, reading  through the reference, #'  serves, I believe,
> as shorthand for (syntax),

>From now on I will use  a Scheme language compliant with the R6RS
standard.  Yes, the following two forms are equivalent:

   (syntax (a b c))
   #'(a b c)

in the same way as the following two forms are equivalent:

   (quote (a b c))
   '(a b c)

  Notice  that it  is the  source  code *reader*  (the lexer  and
parser) which builds the symbolic expression:

   (quote (a b c))

from the sequence of characters:

   '(a b c)

and this happens before any library is loaded, so the source code
expander only sees the  symbolic expression with the QUOTE symbol
in  it.  While the  following program  works because  the library
(rnrs) exports the QUOTE identifier:

   #!r6rs
   (import (rnrs))
   (write '(a b c))

the following program will fail:

   #!r6rs
   (import (except (rnrs) quote))
   (write '(a b c))

in exactly the same way as the following program will fail:

   #!r6rs
   (import (except (rnrs) quote))
   (write (quote (a b c)))

because we  have explicitly excluded  QUOTE from the  import set,
showing:

  $ racket proof.sps
  proof.sps:8:0: compile: unbound identifier in module in: quote

  The same happens with the SYNTAX identifier:

   #!r6rs
   (import (except (rnrs) syntax))
   #'(a b c)

showing:

  $ racket proof.sps
  proof.sps:6:0: compile: unbound identifier in module in: syntax

  Summary: once  our eyes have adapted  to use '---  and #'--- as
abbreviations for (quote ---) and (syntax ---), and it may take a
while, they  are of  friendly usage; but  we have to  remember to
import libraries exporting the QUOTE and SYNTAX identifiers.

> but in what practical situations would I then call it?

The  SYNTAX identifier  is bound  to a  special, very  low level,
macro  integrated  in the  source  code  expander  (which is  the
"preprocessor").

  It is  impossible to fully understand  its implementation using
the mental model of a Scheme program being executed at run-time.

  In  particular: such  a  syntax *cannot*  be implemented  using
DEFINE-SYNTAX  and  it  does  *not*  expand  to  code  using,  at
run-time,  some technique involving  the dynamic  environment and
the DYNAMIC-WIND function.  Rather, when the expander walking the
code finds a form like:

   (syntax . stuff)

it  invokes an internal  function having  access to  its internal
data structures (or something to that effect).

  The gist of  it is: we must use the SYNTAX  macro every time we
hand-write  the  transformer  function  for a  macro,  whose  use
expands into  a form containing identifiers.  Example:  we do not
need  SYNTAX if  the macro  use expands  into a  datum,  like the
string "ciao":

   #!r6rs
   (import (rnrs))

   (define-syntax ciao
     (lambda (stx)
       "ciao"))

   (write (ciao))
   (newline)

but  we  need  SYNTAX if  we  want  to  use  the binding  of  the
identifier NEWLINE in the output form:

   #!r6rs
   (import (rnrs))

   (define-syntax return
     (lambda (stx)
       (syntax (newline))))

   (display "first line")
   (return)
   (display "second line\n")

  Notice that we can use the SYNTAX macro to return a datum, too:

   #!r6rs
   (import (rnrs))

   (define-syntax ciao
     (lambda (stx)
       (syntax "ciao")))

   (write (ciao))
   (newline)

we can think of:

   (syntax "ciao")

as equivalent to "ciao" by itself.

  In what follows, I will  try to describe the basic mechanics of
SYNTAX omitting  a lot  about what the  expander needs to  do its
thing (which  is: to conduct the "expansion  process").  There is
so much  I have to omit that  I hope the result  still makes some
sense. :)

  The  expander  walks  a  symbolic expression  representing  the
source  code recursively  (somewhat)  as shown  by the  following
program you can run using "racket":

   #!r6rs
   (import (rnrs))

   (define (%log which sexp)
     (display which)
     (display ": ")
     (display sexp)
     (newline))

   (define (%log-enter sexp)
     (%log "Enter" sexp))

   (define (%log-exit sexp)
     (%log "Exit" sexp))

   (define (%log-processing sexp)
     (%log "Processing" sexp))

   (define (simulate-expand-recursion sexp)
     (cond ((list? sexp)
            (%log-enter sexp)
            (simulate-expand-recursion (car sexp))
            (unless (null? (cdr sexp))
              (simulate-expand-recursion (cdr sexp)))
            (%log-exit sexp))
           ((symbol? sexp)
            (%log-processing sexp))
           (else
            (%log-processing sexp))))

   (simulate-expand-recursion
      '(let ((a 1))
         (let ((b 2))
           (write a)
           (write b))))

as  you can see  from the  output it  "enters" and  "exits" every
subexpression.

  From now on I will use pseudo-code, unless otherwise specified.

  Internally, the expander constructs a collection data structure
handled somewhat  like a stack, let's call  it "lexical context";
record values are  pushed on this stack.  The  lexical context is
initialised as follows:

   (define-record-type mark
     (fields (immutable name)))

   (define top-mark
     (make-mark "top"))

   (define lexical-context
     (list top-mark))

   (define (push! obj)
     (set! lexical-context (cons obj lexical-context)))

   (define (pop!)
     (set! lexical-context (cdr lexical-context)))

  With reference to the symbolic expression:

   (let ((a 1))
     (let ((b 2))
       (write a)
       (write b)))

when the  expander enters the outer  LET it pushes a  new mark on
the stack:

  (push! (make-mark "outer-let"))

and  then it  pushes a  record representing  the binding  for the
variable A:

  (define-record-type binding
    (fields (immutable name)
            #| other fields here |#))

  (push! (make-binding 'a))

so that the lexical context looks like:

   lexical-context => (#<binding name=a>
                       #<mark name="outer-let">
                       #<mark name="top">)

  When the expander enters the inner  LET it pushes a new mark on
the stack, followed  by a record representing the  binding for B,
so that the lexical context looks like:

   lexical-context => (#<binding name=b>
                       #<mark name="inner-let">
                       #<binding name=a>
                       #<mark name="outer-let">
                       #<mark name="top">)

  When the expander finds the reference to A in the form:

   (write a)

it  searches  the lexical  context  left-to-right  for a  BINDING
record whose name  is A and it finds it: we  say that the binding
"captures" the reference.

  When the expander  exits the inner LET it  removes its bindings
and its mark:

   lexical-context => (#<binding name=a>
                       #<mark name="outer-let">
                       #<mark name="top">)

and when it  exits the outer LET it removes  its bindings and its
mark:

   lexical-context => (#<mark name="top">)

  You get the picture: every binding form like LAMBDA, LET, LET*,
... causes the expander to push on the lexical context a new MARK
record followed by BINDING records; these records stay there only
while  the  expander  is  processing the  corresponding  symbolic
expression.

  Now enter the SYNTAX macro.  It can inspect the lexical context
in  search of  bindings and  other  records.  Let's  look at  the
following program:

   #!r6rs
   (import (rnrs))

   (let ((a 1))

     (define-syntax reference-to-a
       (lambda (stx)
         (syntax a)))

     (write (reference-to-a))
     (newline))

when the expander  processes the LET form it pushes  a MARK and a
BINDING on the lexical context:

   lexical-context => '(#<binding name=a>
                        #<mark name="let-form">
                        #<mark name="top">)

let's omit what  it does to process the  DEFINE-SYNTAX form, what
matters here is that when the macro use:

   (reference-to-a)

is expanded, the transformer function:

   (lambda (stx)         ;STX is ignored here
     (syntax a))

is called and the SYNTAX macro searches left-to-right the lexical
context for a BINDING record whose  name is A, it finds it and so
it  returns what  is needed  to cause  the macro  to expand  to a
reference to A.

  Let's step back because we  have omitted an important fact.  To
write a program  we have to start the source  code with an IMPORT
form  listing  imported  libraries,   else  we  can  do  nothing.
Whenever the expander processes an  import set (a set of bindings
exported from imported libraries),  it pushes all the bindings on
the lexical context; for example:

   #!r6rs
   (import (only (rnrs)
                 write
                 newline))

causes the  bindings for  WRITE and NEWLINE  to be pushed  on the
stack:

   lexical-context => (#<binding name=write>
                       #<binding name=newline>
                       #<mark name="top">)

and:

   #!r6rs
   (import (rnrs))

causes all  the bindings exported by  (rnrs) to be  pushed on the
stack:

   lexical-context => (#<binding name=display>
                       #<binding name=write>
                       #<binding name=newline>
                       ...
                       #<binding name=sin>
                       #<binding name=cos>
                       #<binding name=tan>
                       ...
                       #<mark name="top">)

too many  to be  listed.  This is  why the following  program can
work:

   #!r6rs
   (import (rnrs))

   (define-syntax return
     (lambda (stx)
       (syntax (newline))))

   (return)

whenever the expander processes the macro use:

   (return)

it calls the transformer function:

   (lambda (stx)
     (syntax (newline)))

and the  SYNTAX macro visits the  lexical context in  search of a
BINDING  record whose  name is  NEWLINE, it  finds it  and  so it
returns what is needed to cause  the macro to expand to a call to
NEWLINE.

  Now we can understand why the following program fails:

   #!r6rs
   (import (rnrs))

   (define-syntax hurt-me
     (lambda (stx)
       (syntax sword)))

   (write (hurt-me))

while expanding the macro use:

   (hurt-me)

the  SYNTAX macro  searches  the lexical  context  for a  BINDING
record whose name is SWORD, it  does not find it and so it causes
the program to abort with:

   $ racket proof.sps
   proof.sps:9:12: compile: unbound identifier in module in: sword

  Now enter  the SYNTAX-CASE  macro; we have  to step  back again
because  we  have omitted  another  important fact.   SYNTAX-CASE
provides two main features:

* It deconstructs  the input  form of a  macro use  using pattern
  matching.  I am not going to describe it here in detail.

* It  pushes  records of  type  PATTERN-VARIABLE  on the  lexical
  context, which later  can be searched by the  SYNTAX macro.  We
  want to understand this.

  Let's look at this program:

   #!r6rs
   (import (rnrs))

   (define-syntax the-second-among
     (lambda (stx)
       (syntax-case stx ()
         ((_ ?a ?b ?c)
          (syntax ?b)))))

   (write (the-second-among 1 2 3))
   (newline)

let's fast forward to when the expander processes the macro use:

   (the-second-among 1 2 3)

the transformer function:

   (define transformer
     (lambda (stx)
       (syntax-case stx ()
         ((_ ?a ?b ?c)
          (syntax ?b)))))

is applied to a record instance of type SYNTAX-OBJECT:

   (define-record-type syntax-object
     (fields (immutable sexp)
             (immutable current-lexical-context)
             #| other fields here |#))

   (define stx
     (make-syntax-object '(the-second-among 1 2 3)
                         lexical-context))

   (transformer stx)

the use of SYNTAX-CASE:

   (syntax-case stx ()
     ((_ ?a ?b ?c)
      (syntax ?b)))

decomposes the symbolic expression:

   (the-second-among 1 2 3)

and  pushes on  the lexical  context PATTERN-VARIABLE  records as
follows:

   (define-record-type pattern-variable
     (fields (immutable name)
             (immutable sexp)))

   (push! (make-pattern-variable '?a 1))
   (push! (make-pattern-variable '?b 2))
   (push! (make-pattern-variable '?c 3))

so that:

   lexical-context => (#<pattern-variable name=?c sexp=3>
                       #<pattern-variable name=?b sexp=2>
                       #<pattern-variable name=?a sexp=1>
                       ...
                       #<mark name="top">)

later  the  SYNTAX macro  visits  the  lexical context  searching
left-to-right for a BINDING record *or* a PATTERN-VARIABLE record
whose name is ?A, it finds it and so it returns what is needed to
cause the macro to expand to the symbolic expression 1.

  Got it?
-- 
Marco Maggi


Posted on the users mailing list.