[racket-dev] syntax taints instead of syntax certificates

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Jun 29 21:19:31 EDT 2011

I've pushed a change to Racket's macro system that throws out the
syntax-certificate system and adds a syntax-taint system.


Syntax taints, like syntax certificates before, are intended to
protect macro expansions from abuse. "Abuse" means using
`local-expand' to extract a piece of an expansion, then putting the
piece in a new context or using `datum->syntax' to access unexported
module bindings using the piece's lexical context. Meanwhile,
program-processing tools like `errortrace' or languages like
`typed/racket' are supposed to pull apart expansions and reorganize
them; code inspectors remain the way to distinguish trusted tools and
languages from potential abusers.

Things you need to know:

 * When writing a macro (without `syntax-rules', `define-syntax-rule',
   or `syntax-id-rules'), apply `syntax-protect' to the macro's result
   syntax object to protect it from abuse. The `syntax-rules', etc.,
   forms use `syntax-protect' automatically.

 * If you write program-processing tools or languages that use
   `syntax-recertify', you'll need to change them to use
   `syntax-disarm' and possibly `syntax-rearm'.


More Background
---------------

Combining extensibility with security is a difficult problem in most
any domain. Macros are an important tool for language extensibility,
and `local-expand' enables an important class of language extensions
in Racket, but `local-expand' also opens the door to macro bindings
that break abstractions --- reaching into a macro expansion to access
bindings that are intended to be private.

Syntax certificates were designed to protect private bindings in the
presence of `local-expand'. The idea was that a macro expansion can be
certified as originating from a particular module, and the use of
unexported identifiers would be allowed based on that
certification. Pulling apart a syntax object loses certificates, so
unexported bindings cannot be extracted and used in contexts which the
defining module would not approve.

Certificates didn't work well enough:

 * The distinction between unexported bindings and other bindings
   meant that certificate-maintenance problems were often detected
   late, since detecting problems required an example that spans
   module boundaries and involves unexported identifier.

 * Since certificates are module-specific, identifiers introduced in a
   macro expansion must be distinguished from those present in a macro
   use; certificates thus get tangled up in the macro system's hygiene
   machinery. Macro-generating macros create particular complexities
   when the generator macro lives in one module and the generated
   macro lives in another module. The system of "lifting" certificates
   and "inactive" versus "active" certificates, which evolved to
   address this problem, was difficult to reason about. In the case of
   recent reported problems with certificates, it's still not clear
   whether the certificate system was buggy or misused.

 * We found two holes in the certificate system: 

    - Grabbing a macro transformer by `syntax-local-value' and
      applying it directly could provide access to identifiers with
      "inactive" certificates. The inactive certificates could then be
      lifted to any use context of the macro, defeating the intended
      protection of certificates.

    - Using `local-expand' on method names within a class could expose
      locally bound variables that were intended as private (i.e.,
      they had no protection because they were not module-bound
      variables).

Ryan and I have been looking for a better solution for a long time. I
think Ryan has solved the problem with the design of the taint
system. Proving that syntax taints provide security as intended
remains a steep challenge, but the design closes the known hole with
certificates, and our intuition is that they should work better in
practice:

 * Taints apply uniformly to all identifiers, including local
   identifiers, identifiers in the same module, exported identifiers,
   and unexported identifiers. Taint-management problems in syntax
   tools should therefore become apparent more readily.

 * Taints are not module-specific, which detangles them from hygiene
   machinery and avoids many other design and implementation problems.

   One indicator of this improvement is that the total size of ".zo"
   files in the Racket installation is 20% smaller with taints in
   place of certificates. (DrRacket's footprint is about 5% smaller,
   and start-up time for `racket' is slightly shorter. Interestingly,
   `raco setup' time is not significantly affected --- perhaps because
   certificate tracking had been such a bottleneck in the past that I
   spent a lot of time making it fast.) The "syntax.c" file is 1000
   lines shorter.

   The trigger for a taint (a "dye pack" --- see below) is
   inspector-specific, so macro-generating macros still require some
   care in case the generator and the generated macro are under
   different code inspectors; this extra work is still easier to
   reason about than inactive certificates.

 * The two known holes are fixed:

     - Using `syntax-local-value' to access a transformer directly no
       longer circumvents the system, because the transformer itself
       arms a syntax object, rather than making protection the
       responsibility of the macro expander that invokes the
       transformer. To some degree, this change also allows a macro
       implementor to opt out of the taint system.

       A drawback of the change is that a macro implementor must
       explicitly opt in to protection of its enclosing module --- at
       least when not using `syntax-rules', `define-syntax-rule', or
       `syntax-id-rules', all of which arm the resulting syntax object
       automatically. This drawback is similar to requiring module
       implementors to use `protect-out' on identifiers that should
       not be accessible to untrusted modules (so it's not a new
       problem, in some sense).

     - Taints protect all variables, including the private local
       bindings in the expansion of `class'.

Overall, Ryan and I expect that ease of reasoning will be the main
advantage of traits over certificates.


Taints API Overview
-------------------

"Tainted" identifiers cannot be compiled or expanded in either binding
or use positions. A syntax object is not usually tainted directly
directly. Instead, a "dye pack" is added to a syntax object with
`syntax-arm', which produces an "armed" syntax object; if `stx' is
armed, then `(syntax-e stx)' or `(datum->syntax stx ....)'  produces a
tainted syntax object. A syntax object can be "disarmed" using a
suitable code inspector.


 (syntax-arm stx [insp use-mode?]) -> syntax?
   stx : syntax?
   insp : (or/c inspector? #f)
   use-mode? : any/c = #f

  Arms `new-stx' with a dye pack that is controlled by `insp'.  If
  `insp' is #f, then during the dynamic extent of a transformer, it is
  replaced with the instantiation-time inspector of the module where a
  macro transformer is defined; if no transformation is in progress,
  `(current-code-inspector)' is used.

  If `use-mode?' is a true value, then a 'taint-mode property on `stx'
  can push dye packs to nested syntax objects, and certain forms like
  `begin' and `define-values' also push the dye packs down. This is
  the same as the way that 'certify-mode and the macro expander worked
  before, and 'certify-mode is supported as an alias for 'taint-mode
  for backward compatibility. If a 'taint-mode specifies 'transparent
  for a syntax object, then the syntax object's lexical information is
  removed as the dye pack is pushed to its children, so no lexical
  information is left unprotected.


 (syntax-disarm stx [insp])
   stx : syntax?
   insp : (or/c inspector? #f) = #f

  Removes immediate dye packs (not taints) from `stx' (not from nested
  syntax objects in `stx') for which `insp' is powerful enough.


 (syntax-rearm stx from-stx use-mode?)
   stx : syntax?
   from-stx : syntax?
   use-mode? : any/c = #f

  Copies immediate dye packs and taints from `from-stx' to `stx'.  If
  `use-mode?' is true, the copies are pushed to internal syntax
  object, as in `syntax-arm'.


 (syntax-protect stx) -> syntax?
   stx : syntax?

  An alias for `(syntax-arm stx #f #t)'. A macro typically uses this
  function to protect its result; it's built into `syntax-rules',
  `syntax-id-rules', and `define-syntax-rule'.


 (syntax-tainted? stx) -> boolean?
   stx : syntax?

  Returns #t if `stx' is tainted, #f otherwise.


 (syntax-taint stx) -> syntax?
   stx : syntax?

  Equivalent to `(datum->syntax (syntax-arm stx) (syntax-e stx))', or
  just `stx' if it is already tainted.


The macro expander disarms a syntax object (using an all-powerful
inspector) before it applies a macro transformer to the syntax object.
The macro expander rearms the result of a transformer using the
original syntax object (i.e., any dye packs that were stripped from
the transfomer input are copied to the transformer output). The
'taint-mode (or 'certify-mode, for compatibility) property is used for
rearming.

Only expressions and `#%module-begin' forms should be armed. In
particular, the macro expander disarms only expressions and module
bodies before pulling them apart, and so other tools only need to
worry about taint armings at those levels.

See the Guide section 16.2.5 for more information.



Posted on the dev mailing list.