[racket-dev] syntax taints instead of syntax certificates
I've pushed a change to Racket's macro system that throws out the
syntax-certificate system and adds a syntax-taint system.
Syntax taints, like syntax certificates before, are intended to
protect macro expansions from abuse. "Abuse" means using
`local-expand' to extract a piece of an expansion, then putting the
piece in a new context or using `datum->syntax' to access unexported
module bindings using the piece's lexical context. Meanwhile,
program-processing tools like `errortrace' or languages like
`typed/racket' are supposed to pull apart expansions and reorganize
them; code inspectors remain the way to distinguish trusted tools and
languages from potential abusers.
Things you need to know:
* When writing a macro (without `syntax-rules', `define-syntax-rule',
or `syntax-id-rules'), apply `syntax-protect' to the macro's result
syntax object to protect it from abuse. The `syntax-rules', etc.,
forms use `syntax-protect' automatically.
* If you write program-processing tools or languages that use
`syntax-recertify', you'll need to change them to use
`syntax-disarm' and possibly `syntax-rearm'.
More Background
---------------
Combining extensibility with security is a difficult problem in most
any domain. Macros are an important tool for language extensibility,
and `local-expand' enables an important class of language extensions
in Racket, but `local-expand' also opens the door to macro bindings
that break abstractions --- reaching into a macro expansion to access
bindings that are intended to be private.
Syntax certificates were designed to protect private bindings in the
presence of `local-expand'. The idea was that a macro expansion can be
certified as originating from a particular module, and the use of
unexported identifiers would be allowed based on that
certification. Pulling apart a syntax object loses certificates, so
unexported bindings cannot be extracted and used in contexts which the
defining module would not approve.
Certificates didn't work well enough:
* The distinction between unexported bindings and other bindings
meant that certificate-maintenance problems were often detected
late, since detecting problems required an example that spans
module boundaries and involves unexported identifier.
* Since certificates are module-specific, identifiers introduced in a
macro expansion must be distinguished from those present in a macro
use; certificates thus get tangled up in the macro system's hygiene
machinery. Macro-generating macros create particular complexities
when the generator macro lives in one module and the generated
macro lives in another module. The system of "lifting" certificates
and "inactive" versus "active" certificates, which evolved to
address this problem, was difficult to reason about. In the case of
recent reported problems with certificates, it's still not clear
whether the certificate system was buggy or misused.
* We found two holes in the certificate system:
- Grabbing a macro transformer by `syntax-local-value' and
applying it directly could provide access to identifiers with
"inactive" certificates. The inactive certificates could then be
lifted to any use context of the macro, defeating the intended
protection of certificates.
- Using `local-expand' on method names within a class could expose
locally bound variables that were intended as private (i.e.,
they had no protection because they were not module-bound
variables).
Ryan and I have been looking for a better solution for a long time. I
think Ryan has solved the problem with the design of the taint
system. Proving that syntax taints provide security as intended
remains a steep challenge, but the design closes the known hole with
certificates, and our intuition is that they should work better in
practice:
* Taints apply uniformly to all identifiers, including local
identifiers, identifiers in the same module, exported identifiers,
and unexported identifiers. Taint-management problems in syntax
tools should therefore become apparent more readily.
* Taints are not module-specific, which detangles them from hygiene
machinery and avoids many other design and implementation problems.
One indicator of this improvement is that the total size of ".zo"
files in the Racket installation is 20% smaller with taints in
place of certificates. (DrRacket's footprint is about 5% smaller,
and start-up time for `racket' is slightly shorter. Interestingly,
`raco setup' time is not significantly affected --- perhaps because
certificate tracking had been such a bottleneck in the past that I
spent a lot of time making it fast.) The "syntax.c" file is 1000
lines shorter.
The trigger for a taint (a "dye pack" --- see below) is
inspector-specific, so macro-generating macros still require some
care in case the generator and the generated macro are under
different code inspectors; this extra work is still easier to
reason about than inactive certificates.
* The two known holes are fixed:
- Using `syntax-local-value' to access a transformer directly no
longer circumvents the system, because the transformer itself
arms a syntax object, rather than making protection the
responsibility of the macro expander that invokes the
transformer. To some degree, this change also allows a macro
implementor to opt out of the taint system.
A drawback of the change is that a macro implementor must
explicitly opt in to protection of its enclosing module --- at
least when not using `syntax-rules', `define-syntax-rule', or
`syntax-id-rules', all of which arm the resulting syntax object
automatically. This drawback is similar to requiring module
implementors to use `protect-out' on identifiers that should
not be accessible to untrusted modules (so it's not a new
problem, in some sense).
- Taints protect all variables, including the private local
bindings in the expansion of `class'.
Overall, Ryan and I expect that ease of reasoning will be the main
advantage of traits over certificates.
Taints API Overview
-------------------
"Tainted" identifiers cannot be compiled or expanded in either binding
or use positions. A syntax object is not usually tainted directly
directly. Instead, a "dye pack" is added to a syntax object with
`syntax-arm', which produces an "armed" syntax object; if `stx' is
armed, then `(syntax-e stx)' or `(datum->syntax stx ....)' produces a
tainted syntax object. A syntax object can be "disarmed" using a
suitable code inspector.
(syntax-arm stx [insp use-mode?]) -> syntax?
stx : syntax?
insp : (or/c inspector? #f)
use-mode? : any/c = #f
Arms `new-stx' with a dye pack that is controlled by `insp'. If
`insp' is #f, then during the dynamic extent of a transformer, it is
replaced with the instantiation-time inspector of the module where a
macro transformer is defined; if no transformation is in progress,
`(current-code-inspector)' is used.
If `use-mode?' is a true value, then a 'taint-mode property on `stx'
can push dye packs to nested syntax objects, and certain forms like
`begin' and `define-values' also push the dye packs down. This is
the same as the way that 'certify-mode and the macro expander worked
before, and 'certify-mode is supported as an alias for 'taint-mode
for backward compatibility. If a 'taint-mode specifies 'transparent
for a syntax object, then the syntax object's lexical information is
removed as the dye pack is pushed to its children, so no lexical
information is left unprotected.
(syntax-disarm stx [insp])
stx : syntax?
insp : (or/c inspector? #f) = #f
Removes immediate dye packs (not taints) from `stx' (not from nested
syntax objects in `stx') for which `insp' is powerful enough.
(syntax-rearm stx from-stx use-mode?)
stx : syntax?
from-stx : syntax?
use-mode? : any/c = #f
Copies immediate dye packs and taints from `from-stx' to `stx'. If
`use-mode?' is true, the copies are pushed to internal syntax
object, as in `syntax-arm'.
(syntax-protect stx) -> syntax?
stx : syntax?
An alias for `(syntax-arm stx #f #t)'. A macro typically uses this
function to protect its result; it's built into `syntax-rules',
`syntax-id-rules', and `define-syntax-rule'.
(syntax-tainted? stx) -> boolean?
stx : syntax?
Returns #t if `stx' is tainted, #f otherwise.
(syntax-taint stx) -> syntax?
stx : syntax?
Equivalent to `(datum->syntax (syntax-arm stx) (syntax-e stx))', or
just `stx' if it is already tainted.
The macro expander disarms a syntax object (using an all-powerful
inspector) before it applies a macro transformer to the syntax object.
The macro expander rearms the result of a transformer using the
original syntax object (i.e., any dye packs that were stripped from
the transfomer input are copied to the transformer output). The
'taint-mode (or 'certify-mode, for compatibility) property is used for
rearming.
Only expressions and `#%module-begin' forms should be armed. In
particular, the macro expander disarms only expressions and module
bodies before pulling them apart, and so other tools only need to
worry about taint armings at those levels.
See the Guide section 16.2.5 for more information.