[racket-dev] syntax taints instead of syntax certificates

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Wed Jun 29 22:20:00 EDT 2011

Excellent! Thanks, Matthew and Ryan for figuring out what appears to
be a promising approach to a gaping hole in our security picture.

Robby

On Thu, Jun 30, 2011 at 9:19 AM, Matthew Flatt <mflatt at cs.utah.edu> wrote:
> I've pushed a change to Racket's macro system that throws out the
> syntax-certificate system and adds a syntax-taint system.
>
>
> Syntax taints, like syntax certificates before, are intended to
> protect macro expansions from abuse. "Abuse" means using
> `local-expand' to extract a piece of an expansion, then putting the
> piece in a new context or using `datum->syntax' to access unexported
> module bindings using the piece's lexical context. Meanwhile,
> program-processing tools like `errortrace' or languages like
> `typed/racket' are supposed to pull apart expansions and reorganize
> them; code inspectors remain the way to distinguish trusted tools and
> languages from potential abusers.
>
> Things you need to know:
>
>  * When writing a macro (without `syntax-rules', `define-syntax-rule',
>   or `syntax-id-rules'), apply `syntax-protect' to the macro's result
>   syntax object to protect it from abuse. The `syntax-rules', etc.,
>   forms use `syntax-protect' automatically.
>
>  * If you write program-processing tools or languages that use
>   `syntax-recertify', you'll need to change them to use
>   `syntax-disarm' and possibly `syntax-rearm'.
>
>
> More Background
> ---------------
>
> Combining extensibility with security is a difficult problem in most
> any domain. Macros are an important tool for language extensibility,
> and `local-expand' enables an important class of language extensions
> in Racket, but `local-expand' also opens the door to macro bindings
> that break abstractions --- reaching into a macro expansion to access
> bindings that are intended to be private.
>
> Syntax certificates were designed to protect private bindings in the
> presence of `local-expand'. The idea was that a macro expansion can be
> certified as originating from a particular module, and the use of
> unexported identifiers would be allowed based on that
> certification. Pulling apart a syntax object loses certificates, so
> unexported bindings cannot be extracted and used in contexts which the
> defining module would not approve.
>
> Certificates didn't work well enough:
>
>  * The distinction between unexported bindings and other bindings
>   meant that certificate-maintenance problems were often detected
>   late, since detecting problems required an example that spans
>   module boundaries and involves unexported identifier.
>
>  * Since certificates are module-specific, identifiers introduced in a
>   macro expansion must be distinguished from those present in a macro
>   use; certificates thus get tangled up in the macro system's hygiene
>   machinery. Macro-generating macros create particular complexities
>   when the generator macro lives in one module and the generated
>   macro lives in another module. The system of "lifting" certificates
>   and "inactive" versus "active" certificates, which evolved to
>   address this problem, was difficult to reason about. In the case of
>   recent reported problems with certificates, it's still not clear
>   whether the certificate system was buggy or misused.
>
>  * We found two holes in the certificate system:
>
>    - Grabbing a macro transformer by `syntax-local-value' and
>      applying it directly could provide access to identifiers with
>      "inactive" certificates. The inactive certificates could then be
>      lifted to any use context of the macro, defeating the intended
>      protection of certificates.
>
>    - Using `local-expand' on method names within a class could expose
>      locally bound variables that were intended as private (i.e.,
>      they had no protection because they were not module-bound
>      variables).
>
> Ryan and I have been looking for a better solution for a long time. I
> think Ryan has solved the problem with the design of the taint
> system. Proving that syntax taints provide security as intended
> remains a steep challenge, but the design closes the known hole with
> certificates, and our intuition is that they should work better in
> practice:
>
>  * Taints apply uniformly to all identifiers, including local
>   identifiers, identifiers in the same module, exported identifiers,
>   and unexported identifiers. Taint-management problems in syntax
>   tools should therefore become apparent more readily.
>
>  * Taints are not module-specific, which detangles them from hygiene
>   machinery and avoids many other design and implementation problems.
>
>   One indicator of this improvement is that the total size of ".zo"
>   files in the Racket installation is 20% smaller with taints in
>   place of certificates. (DrRacket's footprint is about 5% smaller,
>   and start-up time for `racket' is slightly shorter. Interestingly,
>   `raco setup' time is not significantly affected --- perhaps because
>   certificate tracking had been such a bottleneck in the past that I
>   spent a lot of time making it fast.) The "syntax.c" file is 1000
>   lines shorter.
>
>   The trigger for a taint (a "dye pack" --- see below) is
>   inspector-specific, so macro-generating macros still require some
>   care in case the generator and the generated macro are under
>   different code inspectors; this extra work is still easier to
>   reason about than inactive certificates.
>
>  * The two known holes are fixed:
>
>     - Using `syntax-local-value' to access a transformer directly no
>       longer circumvents the system, because the transformer itself
>       arms a syntax object, rather than making protection the
>       responsibility of the macro expander that invokes the
>       transformer. To some degree, this change also allows a macro
>       implementor to opt out of the taint system.
>
>       A drawback of the change is that a macro implementor must
>       explicitly opt in to protection of its enclosing module --- at
>       least when not using `syntax-rules', `define-syntax-rule', or
>       `syntax-id-rules', all of which arm the resulting syntax object
>       automatically. This drawback is similar to requiring module
>       implementors to use `protect-out' on identifiers that should
>       not be accessible to untrusted modules (so it's not a new
>       problem, in some sense).
>
>     - Taints protect all variables, including the private local
>       bindings in the expansion of `class'.
>
> Overall, Ryan and I expect that ease of reasoning will be the main
> advantage of traits over certificates.
>
>
> Taints API Overview
> -------------------
>
> "Tainted" identifiers cannot be compiled or expanded in either binding
> or use positions. A syntax object is not usually tainted directly
> directly. Instead, a "dye pack" is added to a syntax object with
> `syntax-arm', which produces an "armed" syntax object; if `stx' is
> armed, then `(syntax-e stx)' or `(datum->syntax stx ....)'  produces a
> tainted syntax object. A syntax object can be "disarmed" using a
> suitable code inspector.
>
>
>  (syntax-arm stx [insp use-mode?]) -> syntax?
>   stx : syntax?
>   insp : (or/c inspector? #f)
>   use-mode? : any/c = #f
>
>  Arms `new-stx' with a dye pack that is controlled by `insp'.  If
>  `insp' is #f, then during the dynamic extent of a transformer, it is
>  replaced with the instantiation-time inspector of the module where a
>  macro transformer is defined; if no transformation is in progress,
>  `(current-code-inspector)' is used.
>
>  If `use-mode?' is a true value, then a 'taint-mode property on `stx'
>  can push dye packs to nested syntax objects, and certain forms like
>  `begin' and `define-values' also push the dye packs down. This is
>  the same as the way that 'certify-mode and the macro expander worked
>  before, and 'certify-mode is supported as an alias for 'taint-mode
>  for backward compatibility. If a 'taint-mode specifies 'transparent
>  for a syntax object, then the syntax object's lexical information is
>  removed as the dye pack is pushed to its children, so no lexical
>  information is left unprotected.
>
>
>  (syntax-disarm stx [insp])
>   stx : syntax?
>   insp : (or/c inspector? #f) = #f
>
>  Removes immediate dye packs (not taints) from `stx' (not from nested
>  syntax objects in `stx') for which `insp' is powerful enough.
>
>
>  (syntax-rearm stx from-stx use-mode?)
>   stx : syntax?
>   from-stx : syntax?
>   use-mode? : any/c = #f
>
>  Copies immediate dye packs and taints from `from-stx' to `stx'.  If
>  `use-mode?' is true, the copies are pushed to internal syntax
>  object, as in `syntax-arm'.
>
>
>  (syntax-protect stx) -> syntax?
>   stx : syntax?
>
>  An alias for `(syntax-arm stx #f #t)'. A macro typically uses this
>  function to protect its result; it's built into `syntax-rules',
>  `syntax-id-rules', and `define-syntax-rule'.
>
>
>  (syntax-tainted? stx) -> boolean?
>   stx : syntax?
>
>  Returns #t if `stx' is tainted, #f otherwise.
>
>
>  (syntax-taint stx) -> syntax?
>   stx : syntax?
>
>  Equivalent to `(datum->syntax (syntax-arm stx) (syntax-e stx))', or
>  just `stx' if it is already tainted.
>
>
> The macro expander disarms a syntax object (using an all-powerful
> inspector) before it applies a macro transformer to the syntax object.
> The macro expander rearms the result of a transformer using the
> original syntax object (i.e., any dye packs that were stripped from
> the transfomer input are copied to the transformer output). The
> 'taint-mode (or 'certify-mode, for compatibility) property is used for
> rearming.
>
> Only expressions and `#%module-begin' forms should be armed. In
> particular, the macro expander disarms only expressions and module
> bodies before pulling them apart, and so other tools only need to
> worry about taint armings at those levels.
>
> See the Guide section 16.2.5 for more information.
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://lists.racket-lang.org/listinfo/dev
>



Posted on the dev mailing list.