[racket-dev] syntax taints instead of syntax certificates
Excellent! Thanks, Matthew and Ryan for figuring out what appears to
be a promising approach to a gaping hole in our security picture.
Robby
On Thu, Jun 30, 2011 at 9:19 AM, Matthew Flatt <mflatt at cs.utah.edu> wrote:
> I've pushed a change to Racket's macro system that throws out the
> syntax-certificate system and adds a syntax-taint system.
>
>
> Syntax taints, like syntax certificates before, are intended to
> protect macro expansions from abuse. "Abuse" means using
> `local-expand' to extract a piece of an expansion, then putting the
> piece in a new context or using `datum->syntax' to access unexported
> module bindings using the piece's lexical context. Meanwhile,
> program-processing tools like `errortrace' or languages like
> `typed/racket' are supposed to pull apart expansions and reorganize
> them; code inspectors remain the way to distinguish trusted tools and
> languages from potential abusers.
>
> Things you need to know:
>
> * When writing a macro (without `syntax-rules', `define-syntax-rule',
> or `syntax-id-rules'), apply `syntax-protect' to the macro's result
> syntax object to protect it from abuse. The `syntax-rules', etc.,
> forms use `syntax-protect' automatically.
>
> * If you write program-processing tools or languages that use
> `syntax-recertify', you'll need to change them to use
> `syntax-disarm' and possibly `syntax-rearm'.
>
>
> More Background
> ---------------
>
> Combining extensibility with security is a difficult problem in most
> any domain. Macros are an important tool for language extensibility,
> and `local-expand' enables an important class of language extensions
> in Racket, but `local-expand' also opens the door to macro bindings
> that break abstractions --- reaching into a macro expansion to access
> bindings that are intended to be private.
>
> Syntax certificates were designed to protect private bindings in the
> presence of `local-expand'. The idea was that a macro expansion can be
> certified as originating from a particular module, and the use of
> unexported identifiers would be allowed based on that
> certification. Pulling apart a syntax object loses certificates, so
> unexported bindings cannot be extracted and used in contexts which the
> defining module would not approve.
>
> Certificates didn't work well enough:
>
> * The distinction between unexported bindings and other bindings
> meant that certificate-maintenance problems were often detected
> late, since detecting problems required an example that spans
> module boundaries and involves unexported identifier.
>
> * Since certificates are module-specific, identifiers introduced in a
> macro expansion must be distinguished from those present in a macro
> use; certificates thus get tangled up in the macro system's hygiene
> machinery. Macro-generating macros create particular complexities
> when the generator macro lives in one module and the generated
> macro lives in another module. The system of "lifting" certificates
> and "inactive" versus "active" certificates, which evolved to
> address this problem, was difficult to reason about. In the case of
> recent reported problems with certificates, it's still not clear
> whether the certificate system was buggy or misused.
>
> * We found two holes in the certificate system:
>
> - Grabbing a macro transformer by `syntax-local-value' and
> applying it directly could provide access to identifiers with
> "inactive" certificates. The inactive certificates could then be
> lifted to any use context of the macro, defeating the intended
> protection of certificates.
>
> - Using `local-expand' on method names within a class could expose
> locally bound variables that were intended as private (i.e.,
> they had no protection because they were not module-bound
> variables).
>
> Ryan and I have been looking for a better solution for a long time. I
> think Ryan has solved the problem with the design of the taint
> system. Proving that syntax taints provide security as intended
> remains a steep challenge, but the design closes the known hole with
> certificates, and our intuition is that they should work better in
> practice:
>
> * Taints apply uniformly to all identifiers, including local
> identifiers, identifiers in the same module, exported identifiers,
> and unexported identifiers. Taint-management problems in syntax
> tools should therefore become apparent more readily.
>
> * Taints are not module-specific, which detangles them from hygiene
> machinery and avoids many other design and implementation problems.
>
> One indicator of this improvement is that the total size of ".zo"
> files in the Racket installation is 20% smaller with taints in
> place of certificates. (DrRacket's footprint is about 5% smaller,
> and start-up time for `racket' is slightly shorter. Interestingly,
> `raco setup' time is not significantly affected --- perhaps because
> certificate tracking had been such a bottleneck in the past that I
> spent a lot of time making it fast.) The "syntax.c" file is 1000
> lines shorter.
>
> The trigger for a taint (a "dye pack" --- see below) is
> inspector-specific, so macro-generating macros still require some
> care in case the generator and the generated macro are under
> different code inspectors; this extra work is still easier to
> reason about than inactive certificates.
>
> * The two known holes are fixed:
>
> - Using `syntax-local-value' to access a transformer directly no
> longer circumvents the system, because the transformer itself
> arms a syntax object, rather than making protection the
> responsibility of the macro expander that invokes the
> transformer. To some degree, this change also allows a macro
> implementor to opt out of the taint system.
>
> A drawback of the change is that a macro implementor must
> explicitly opt in to protection of its enclosing module --- at
> least when not using `syntax-rules', `define-syntax-rule', or
> `syntax-id-rules', all of which arm the resulting syntax object
> automatically. This drawback is similar to requiring module
> implementors to use `protect-out' on identifiers that should
> not be accessible to untrusted modules (so it's not a new
> problem, in some sense).
>
> - Taints protect all variables, including the private local
> bindings in the expansion of `class'.
>
> Overall, Ryan and I expect that ease of reasoning will be the main
> advantage of traits over certificates.
>
>
> Taints API Overview
> -------------------
>
> "Tainted" identifiers cannot be compiled or expanded in either binding
> or use positions. A syntax object is not usually tainted directly
> directly. Instead, a "dye pack" is added to a syntax object with
> `syntax-arm', which produces an "armed" syntax object; if `stx' is
> armed, then `(syntax-e stx)' or `(datum->syntax stx ....)' produces a
> tainted syntax object. A syntax object can be "disarmed" using a
> suitable code inspector.
>
>
> (syntax-arm stx [insp use-mode?]) -> syntax?
> stx : syntax?
> insp : (or/c inspector? #f)
> use-mode? : any/c = #f
>
> Arms `new-stx' with a dye pack that is controlled by `insp'. If
> `insp' is #f, then during the dynamic extent of a transformer, it is
> replaced with the instantiation-time inspector of the module where a
> macro transformer is defined; if no transformation is in progress,
> `(current-code-inspector)' is used.
>
> If `use-mode?' is a true value, then a 'taint-mode property on `stx'
> can push dye packs to nested syntax objects, and certain forms like
> `begin' and `define-values' also push the dye packs down. This is
> the same as the way that 'certify-mode and the macro expander worked
> before, and 'certify-mode is supported as an alias for 'taint-mode
> for backward compatibility. If a 'taint-mode specifies 'transparent
> for a syntax object, then the syntax object's lexical information is
> removed as the dye pack is pushed to its children, so no lexical
> information is left unprotected.
>
>
> (syntax-disarm stx [insp])
> stx : syntax?
> insp : (or/c inspector? #f) = #f
>
> Removes immediate dye packs (not taints) from `stx' (not from nested
> syntax objects in `stx') for which `insp' is powerful enough.
>
>
> (syntax-rearm stx from-stx use-mode?)
> stx : syntax?
> from-stx : syntax?
> use-mode? : any/c = #f
>
> Copies immediate dye packs and taints from `from-stx' to `stx'. If
> `use-mode?' is true, the copies are pushed to internal syntax
> object, as in `syntax-arm'.
>
>
> (syntax-protect stx) -> syntax?
> stx : syntax?
>
> An alias for `(syntax-arm stx #f #t)'. A macro typically uses this
> function to protect its result; it's built into `syntax-rules',
> `syntax-id-rules', and `define-syntax-rule'.
>
>
> (syntax-tainted? stx) -> boolean?
> stx : syntax?
>
> Returns #t if `stx' is tainted, #f otherwise.
>
>
> (syntax-taint stx) -> syntax?
> stx : syntax?
>
> Equivalent to `(datum->syntax (syntax-arm stx) (syntax-e stx))', or
> just `stx' if it is already tainted.
>
>
> The macro expander disarms a syntax object (using an all-powerful
> inspector) before it applies a macro transformer to the syntax object.
> The macro expander rearms the result of a transformer using the
> original syntax object (i.e., any dye packs that were stripped from
> the transfomer input are copied to the transformer output). The
> 'taint-mode (or 'certify-mode, for compatibility) property is used for
> rearming.
>
> Only expressions and `#%module-begin' forms should be armed. In
> particular, the macro expander disarms only expressions and module
> bodies before pulling them apart, and so other tools only need to
> worry about taint armings at those levels.
>
> See the Guide section 16.2.5 for more information.
>
> _________________________________________________
> For list-related administrative tasks:
> http://lists.racket-lang.org/listinfo/dev
>