[racket] phases

From: Jon Rafkind (rafkind at cs.utah.edu)
Date: Thu Mar 1 15:31:52 EST 2012

Recent problems with phases have led me to investigate how they work in more detail. Here is a brief tutorial on what they are and how they work with macros. The guide and reference have something to say about phases but I don't think they go into enough detail.

Bindings exist in a phase. The link between a binding and its phase is represented by an integer. Phase 0 is the phase used for "plain" definitions, so

(define x 5)

Will put a binding for 'x' into phase 0. 'x' can be defined at higher phases easily

(begin-for-syntax
  (define x 5))

Now 'x' is defined at phase 1. We can easily mix these two definitions in the same module, there is no clash between the two x's because they are defined at different phases.

(define x 3)
(begin-for-syntax
  (define x 9))

'x' at phase 0 has a value of 3 and 'x' at phase 1 has a value of 9.

Syntax objects can refer to these bindings, essentially they capture the binding as a value that can be passed around.

#'x

Is a syntax object that represents the 'x' binding. But which 'x' binding? In the last example there are two x's, one at phase 0 and one at phase 1. Racket will imbue #'x with lexical information for all phases, so the answer is both!

Racket knows which 'x' to use when the syntax object is used. I'll use eval just for a second to prove a point.

First we bind #'x to a pattern variable so we can use it in a template and then just print it.
(eval (with-syntax ([x #'x])
        #'(printf "~a\n" x)))

This will print 3 because x at phase 0 is bound to 3.

(eval (with-syntax ([x #'x])
        #'(begin-for-syntax
            (printf "~a\n" x))))

This will print 9 because we are using x at phase 1 instead of 0. How does Racket know we wanted to use x at phase 1 instead of 0? Because of the 'begin-for-syntax'. So you can see that we started with the same syntax object, #'x, and was able to use it in two different ways -- at phase 0 and at phase 1.

When a syntax object is created its lexical context is immediately set up. When a syntax object is provided from a module its lexical context will still reference the things that were around in the module it came from.

This module will define 'foo' at phase 0 bound to the value 0 and 'sfoo' which binds the syntax object for 'foo'.

;; a.rkt
(define foo 0)
(provide (for-syntax sfoo))
(define-for-syntax sfoo #'foo)
;; why not (define sfoo #'foo) ? I will explain later

;; b.rkt
(require "q.rkt")
(define foo 8)
(define-syntax (m stx)
  sfoo)
(m)

The result of the (m) macro will be whatever value 'sfoo' is bound to, which is #'foo. The #'foo that 'sfoo' knows that 'foo' is bound from the a.rkt module at phase 0. Even though there is another 'foo' in b.rkt this will not confuse Racket.

Note that 'sfoo' is bound at phase 1. This is because (m) is a macro so its body executes at one phase higher than it was defined at. Since it was defined at phase 0 it will execute at phase 1, so any bindings it refers to also need to be bound at phase 1.

Now really what I want to show is how bindings can be confused when modules are imported at different phases. Racket allows us to import a module at an arbitrary phase using require.

(require "a.rkt") ;; import at phase 0
(require (for-syntax "a.rkt")) ;; import at phase 1
(require (for-template "a.rkt")) ;; import at phase -1
(require (for-meta 5 "a.rkt" )) ;; import at phase 5

What does it mean to 'import at phase 1'? Effectively it means that all the bindings from that module will have their phase increased by one.

;; c.rkt
(define x 0) ;; x is defined at phase 0

;; d.rkt
(require (for-syntax "c.rkt"))

Now in d.rkt there will be a binding for 'x' at phase 1 instead of phase 0.

So lets look at a.rkt from above and see what happens if we try to create a binding for the #'foo syntax object at phase 0.

;; a.rkt
(define foo 0)
(define sfoo #'foo)
(provide sfoo)

Now both 'foo' and 'sfoo' are defined at phase 0. The lexical context of #'foo will know that there is a binding for 'foo' at phase 0. In fact it seems like things are working just fine, if we try to eval sfoo in a.rkt we will get 0.

(eval sfoo)
--> 0

But now lets use sfoo in a macro.

(define-syntax (m stx)
  sfoo)
(m)

We get an error 'reference to an identifier before its definition: sfoo'. Clearly 'sfoo' is not defined at phase 1 so we cannot refer to it inside the macro. Lets try to use 'sfoo' in another module by importing a.rkt at phase 1. Then we will get 'sfoo' at phase 1.

;; b.rkt
(require (for-syntax "a.rkt")) ;; now we have sfoo at phase 1
(define-syntax (m stx)
  sfoo)
(m)

$ racket b.rkt
compile: unbound identifier (and no #%top syntax transformer is bound) in: foo

Racket says that 'foo' is unbound now. When 'a.rkt' is imported at phase 1 we have the following bindings

foo at phase 1
sfoo at phase 1

So the macro 'm' can see sfoo and will return the #'foo syntax object which knows that 'foo' was bound at phase 0. But there is no 'foo' at phase 0 in b.rkt, there is only a 'foo' at phase 1, so we get an error. That is why 'sfoo' needed to be bound at phase 1 in a.rkt. In that case we would have had the following bindings after doing (require "a.rkt")

foo at phase 0
sfoo at phase 1

So we can still use 'sfoo' in the macro since its bound at phase 1 and when the macro finishes it will refer to a 'foo' binding at phase 0.

If we import a.rkt at phase 1 we can still manage to use 'sfoo'. The trick is to create a syntax object that will be evaluated at phase 1 instead of 0. We can do that with 'begin-for-syntax'.

;; a.rkt
(define foo 0)
(define sfoo #'foo)
(provide sfoo)

;; b.rkt
(require (for-syntax "a.rkt"))
(define-syntax (m stx)
  (with-syntax ([x sfoo])
    #'(begin-for-syntax
        (printf "~a\n" x))))
(m)

b.rkt has 'foo' and 'sfoo' bound at phase 1. The output of the macro will be

(begin-for-syntax
  (printf "~a\n" foo))

Because 'sfoo' will turn into 'foo' when the template is expanded. Now this expression will work because 'foo' is bound at phase 1.

Now you might try to cheat the phase system by importing a.rkt at both phase 0 and phase 1. Then you would have the following bindings

foo at phase 0
sfoo at phase 0
foo at phase 1
sfoo at phase 1

So just using sfoo in a macro should work

;; b.rkt
(require "a.rkt"
         (for-syntax "a.rkt"))
(define-syntax (m stx)
  sfoo)
(m)

The 'sfoo' inside the 'm' macro comes from the (for-syntax "a.rkt"). For this macro to work there must be a 'foo' at phase 0 bound, and there is one from the plain "a.rkt" imported at phase 0. But in fact this macro doesn't work, it says 'foo' is unbound. The key is that "a.rkt" and (for-syntax "a.rkt") are different instantiations of the same module. The 'sfoo' at phase 1 only knows that about 'foo' at phase 1, it does not know about the 'foo' bound at phase 0 from a different instantiation, even from the same file.

So this means that if you have a two functions in a module, one that produces a syntax object and one that matches on it (say using syntax/parse) the module needs to be imported once at the proper phase. The module can't be imported once at phase 0 and again at phase 1 and be expected to work.

;; x.rkt
#lang racket

(require (for-syntax syntax/parse)
         (for-template racket/base))
                  
(provide (all-defined-out))

(define foo 0)
(define (make) #'foo)
(define-syntax (process stx)
(define-literal-set locals (foo))
  (syntax-parse stx
    [(_ (n (~literal foo))) #'#''ok]))

;; y.rkt
#lang racket

(require (for-meta 1 "q6.rkt")
         (for-meta 2 "q6.rkt" racket/base)
         ;; (for-meta 2 racket/base)
         )
         
(begin-for-syntax
  (define-syntax (m stx)
    (with-syntax ([out (make)])
      #'(process (0 out)))))
    
(define-syntax (p stx)
  (m))

(p)

$ racket y.rkt
process: expected the identifier `foo' at: foo in: (process (0 foo))

'make' is being used in y.rkt at phase 2 and returns the #'foo syntax object which knows that foo is bound at phase 0 inside y.rkt, and at phase 2 from (for-meta 2 "q6.rkt"). The 'process' macro is imported at phase 1 from (for-meta 1 "q6.rkt") and knows that foo should be bound at phase 1 so when the syntax-parse is executed inside 'process' it is looking for 'foo' bound at phase 1 but it sees a phase 2 binding and so doesn't match.

To fix this we can provide 'make' at phase 1 relative to x.rkt and just import it at phase 1 in y.rkt

;; x.rkt
#lang racket

(require (for-syntax syntax/parse)
         (for-template racket/base))
                  
(provide (all-defined-out))

(define foo 0)
(provide (for-syntax make))
(define-for-syntax (make) #'foo)
(define-syntax (process stx)
(define-literal-set locals (foo))
  (syntax-parse stx
    [(_ (n (~literal foo))) #'#''ok]))

;; y.rkt
#lang racket

(require (for-meta 1 "q6.rkt")
         ;; (for-meta 2 "q6.rkt" racket/base)
         (for-meta 2 racket/base)
         )
         
(begin-for-syntax
  (define-syntax (m stx)
    (with-syntax ([out (make)])
      #'(process (0 out)))))
    
(define-syntax (p stx)
  (m))

(p)

$ racket y.rkt
'ok

Posted on the users mailing list.