[plt-scheme] module security, part 2

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Aug 25 11:17:06 EDT 2004

At Wed, 11 Aug 2004 09:50:47 -0700, Matthew Flatt wrote:
> Program-Processing Tools
> ------------------------
> 
> Program-processing tools, such as Errortrace, pull apart macro
> expansions and assemble them in a different way. These
> program-processing programs break in v299.13, because certificates get
> lost by re-assembly.
> 
> [...] This part is not complete in v299.13. For now, Errortrace (or anyone
> else) can use `syntax-recertify' to transfer an arbitrary certificate
> from an old expression to one that it constructs.
> 
> In v299.14, `syntax-recertify' will require more proof that the
> re-certification is ok.

This part is now complete.

Is It Really Secure?
--------------------

I don't know. Nothing that takes two long messages to explain is as
simple as anyone would like, and there is an awful lot of room for
error. So far, though, it seems to work, and I'm optimistic. We'll have
to try it for a while before we can draw any meaningful conclusions.

One source of insecurity is that MzScheme will have bugs, but I'd like
to draw out the other possible source: the new constructs might not
provide enough security, even when they work as defined.

For example, in the module

  (module m mzscheme
    (provide t)
    (define hidden 5)
    (define t 
      (syntax-rules ()
        [(_ f) (f 17)])))

the `hidden' binding is accessible because the `t' macro expands to an
application of the given `f'. The `f' provided to `t' might be a macro,
in which case it is given a `17' that has the lexical context of `m'.
Also, the result from `f' will be certified as if it came from `m',
since `t' is in `m'. Thus, `f' can use `datum->syntax-object' to get at
`hidden'.

In general, a macro must not expand to a use of another macro that it
doesn't trust. If a macro's intent is to expand to a function
application using a given function expression, the result must be
constructed to ensure that a macro identifier isn't given in place of
a function expression. For example, the above could be changed to

  (module m mzscheme
    (provide t)
    (define hidden 5)
    (define t 
      (syntax-rules ()
        [(_ f) (let ([the-func f]) (the-func 17))])))

to ensure that `hidden' stays hidden.

Fortunately, it appears that few macros place parts of their input into
application positions, or otherwise expand into uses of untrusted
macros. The only case that I can find is in the `match' macro for
patterns that use `?', and this case could be easily changed in the
same way as `t'. Another mitigating factor is that the `match' module
doesn't seem to have anything to hide, so the problem doesn't matter.
Still, there may be more cases like `match', and maybe --- just maybe
--- one of the them matters.

A second example is related to recertifying syntax objects. As
explained later in this message, recertification is allowed on objects
where the inputs to a macro have been changed, as long as the only
parts that changed are ones that were supplied to some macro.

In the following example, `go' checks that its sub-expression is a
literal number, and in that case, generates a direct call to the
`unchecked-go' function:

  (module m mzscheme
    (provide go)
    (define (unchecked-go n x) 
      ;; to avoid disaster, `n' must be a number
      (+ n 17))
    (define-syntax (go stx)
      (syntax-case stx ()
       [(_ n x)
        (number? (syntax-e #'n))
        #'(unchecked-go n x)])))

There's no way to extract the `unchecked-go' identifier and place it
into a completely new context where it is applied to a non-number.
However, since the `n' sub-expression was supplied to the macro, the
syntax-object result of `go' can be used to recertify a a syntax object
that is exactly the same, except that the `n' sub-expression is
replaced by something else --- say #f --- leading to disaster.

Again, there's a solution in this case, but only if someone notices the
problem:

    (define-syntax (go stx)
      (syntax-case stx ()
       [(_ n x)
        (number? (syntax-e #'n))
        (with-syntax ([n (syntax-e #'n)]) ; create a new syntax object
          #'(unchecked-go n x))]))

I have not seen any examples of this problem in practice. But I may
have overlooked one, and I don't know whether there will always be a
workaround when we find a real example.

Those are the two problems that I see, and there may well be more. Even
with these caveats, I think that MzScheme might now provide a practical
level of security for module bindings. We'll see.

Meanwhile, back to explaining those new constructs...


Recertifying
------------

The `syntax-recertify', etc. procedures are all documented in the
MzScheme manual, but I'll give an overview in this message to show how
they are meant to be used.

The idea behind re-certifying is that a syntax object contains some
sub-expressions that can be replaced, and some that cannot. The
sub-expressions that cannot be replaced are ones that are introduced by
a macro that is not under the program processor's control (e.g., not
under errortrace's control). The sub-expressions that can be replaced
correspond to the inputs of uncontrolled macros.

Thus, MzScheme is willing to transfer a certificate from a syntax
object A to a syntax object B, as long as B differs from A only in
those sub-expressions that can be replaced. And the ones that can be
replaced are the ones that do not have a uncontrolled certificate's
mark in A (i.e., the sub-expression was a input to the macro) or B
(i.e., no one tried to slip in a reference to be certified).

Useless Certificates
--------------------

Some certificates are not useful, and therefore not worth trying to
transfer. For example

 (define (f x) 10)

expands into

 (define f (lambda (x) 10))

and the `(lambda (...) ...)' part comes from the `define' macro, which
a program like errortrace can't control. So there's a certificate that
will be lost of the `lambda' is rearranged. Nevertheless, the macro
doesn't introduce any references to unexported or protected variables,
so losing the certificate doesn't matter.

Certificate Environments
------------------------

Keeping in mind that some certificates are useless, a program processor
must figure out what sub-expressions can be modified, and then modify
sub-expressions and reconstruct so that the result can be recertified.
The processor can do this in one pass by keeping an environment of
certificates, based on the following two MzScheme procedures:

 syntax-make-certificate-env :  -> certificate-env
 syntax-extend-certificate-env : stx certificate-env -> certificate-env

The `syntax-extend-certificate-env' procedure can add multiple
certificates to the environment, because the given syntax object can
have multiple certificates (where a macro produced a use of a macro,
and so on). However, `syntax-extend-certificate-env' only adds
certificates that are used by some protected or unexported identifier
in the given syntax object, so that useless certificates are ignored.

Changing Sub-expressions
------------------------

To help determine how much of an expression can be modified, MzScheme
provides the following predicate:

 syntax-recertify-constrained? : stx certificate-env inspector -> bool

For this procedure, the inspector represents the program processor. If
there's a certificate in the environment that is controlled by this
inspector, it is ignored (because the program processor will be able to
move that certificate any way that it wants). For any other certificate
in the environment, if the certificate's mark is used in the given
syntax object, then the syntax object is contrained, and the result will
be true.

Note that the mark doesn't have to be used for a protected or
unexported identifier for the syntax object to be constrained. If a
certificate is in the environment, then it has been deemed useful
already by `syntax-extend-certificate-env'. Consequently, the program
processor is obligated to preserve exactly the marked subexpressions
that surround the protected/unexported identifier. (Surrounding
sub-expressions without the mark must have originated as inputs to the
macro, so we assume that they were arbitrary, and thus subject to
change by the processor.)

If an expression is unconstrained, then the processor can make
arbitrary changes --- no sub-expressions are constrained either. If an
expression is constrained, then some sub-expression might still be
changed, but the overall expression must be reconstructed to keep
precisely the marks/renames (i.e., lexical information) of unchanged
parts.

Errortrace uses the following `rebuild' function to reconstruct
correctly, given a mapping from the changed sub-objects to their
replacements.

   ; rebuild : stx assoc-of-stx-to-stx -> stx
   (define (rebuild expr replacements)
     (let ([a (assq expr replacements)])
       (if a
           (cdr a)
           (cond
            [(pair? expr) (cons (rebuild (car expr) replacements)
                                (rebuild (cdr expr) replacements))]
            [(vector? expr) (list->vector
                             (map (lambda (expr)
                                    (rebuild expr replacements))
                                  (vector->list expr)))]
            [(box? expr) (box (rebuild (unbox expr) replacements))]
            [(syntax? expr) (if (identifier? expr)
                                expr
                                (datum->syntax-object
                                 expr
                                 (rebuild (syntax-e expr) replacements)
                                 expr))]
            [else expr]))))

Recertifying
------------

Once an expression is correctly rebuilt, it can be recertified using

 syntax-recertify : stx stx inspector -> stx

If a certificate on the first syntax object is controlled by the given
inspector, then it is transferred to the second object in the result.

If a certificate is not controlled by the given inspector, but not used
by any protected or unexported identifier (i.e., it would not be
included in the environment by `syntax-extend-certificate-env'), then
it is ignored and not transfered.

Any other certificate is transferred as long as the first syntax object
and the second syntax object differ only in sub-objects that do not
contain the certificate's mark; if they differ in a sub-object with a
mark, an exception is raised.

Putting It All Together
-----------------------

Mostly, Errortrace changes a program by wrapping some expressions with
`with-continuation-mark'. Even some constrained sub-expression can be
wrapped; a constrained expression cannot be wrapped in the case that
it's an immediate sub-expression of another constrained expression.

Overall, Errortrace's instrumentation function is roughly

 (let loop ([expr-stx program-stx]
            [env (syntax-make-certificate-env)]
            [can-wrap? #t])
   (let* ([env (syntax-extend-certificate-env expr-stx env)]
          [constr? (syntax-recertify-constrained? expr env orig-inspector)]
          [can-wrap? (or can-wrap? (not constr?))]
          [can-wrap-next? (not constr?)])
      (kernel-syntax-case expr-stx 
        ...
	[(....  sub-expr ...) ; match something with sub-expressions
         ((if can-wrap? wrap-for-trace (lambda (x) x))
          (syntax-recertify
	   expr-stx
           (rebuild
            expr-stx
            (map (lambda (sub-expr) 
                   (cons sub 
                         (loop sub-expr env can-wrap-next?)))
                 (syntax->list #'(sub-expr ...))))
           trace-inspector))]
        ....)))

If you load Errortrace and then run

  (module m mzscheme
    (provide go)
    (define (i n) n)
    (define-syntax go
      (syntax-rules ()
        [(_ arg) (i (i (i (i arg))))])))

  (require m)
  (go (+ 1 'a))

in the error trace prints as

 +: expects type <number> as 2nd argument, given: a; other arguments were: 1
 /home/mflatt/tmp/ex.ss:10:4: (+ 1 (quote a))
 /home/mflatt/tmp/ex.ss:10:0: (i (i (i (i (+ (....) ....)))))
 
But if you add `i' to the exports of `m', it prints as

 +: expects type <number> as 2nd argument, given: a; other arguments were: 1
 /home/mflatt/tmp/ex.ss:10:4: (+ 1 (quote a))
 /home/mflatt/tmp/ex.ss:7:24: (i (+ 1 (quote a)))
 /home/mflatt/tmp/ex.ss:7:21: (i (i (+ 1 (quote a))))
 /home/mflatt/tmp/ex.ss:7:18: (i (i (i (+ 1 (quote ....)))))
 /home/mflatt/tmp/ex.ss:10:0: (i (i (i (i (+ (....) ....)))))

In the first case, Errortrace is not allowed to break apart the result
of `go', because the introduced `i' is unexported. In the second case,
Errortrace is allowed to wrap each `i' call and provide a more precise
trace.

[Actually, since Errortrace was loaded before `m', it should have
 somehow given itself control over `m', and therefore produced the
 second trace in both cases. That's the next planned improvement for
 Errortrace.]

If you have a v299 program-processing tool and it's built on `(lib
"stacktrace.ss" "errortrace")', then probably it works fine already,
since I have updated Errortrace. If you have your own recursive
processing function, though, then you will need to use
`syntax-recertify', etc. in a way that similar to Errortrace,


Matthew



Posted on the users mailing list.