[plt-scheme] module security

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Aug 11 12:50:47 EDT 2004
Previous message: [plt-scheme] 299.13
Next message: [plt-scheme] module security, part 2
Messages sorted by: [date] [thread] [subject] [author]
Security has always been a design goal for MzScheme and MrEd. Since
version 200, however, the module system has provided a false sense of
security:

 1. Certain modules reside in "private" subdirectories, with the idea
    that only trusted clients use the modules' exports. But there's no
    enforcement of this intent.

 2. Using `expand', a program can gain access to a module's unexported
    variables that appear in a macro expansion. For example, by
    expanding a `send' form, a program can gain access to the
    `find-method' function, obtain a direct reference to an object's
    method, and apply the method to a value other than the object
    itself.

 3. Using namespaces and carefully constructed dummy modules, a program
    can gain access to unexported variables that never appear in any
    macro expansion. For example, a program can gain access to accessor
    functions in `(lib "class-internal.ss" "mzlib" "private")' to pull
    apart the struct representing any object.

We knew about these problems, but a solution wasn't immediately
apparent, and it's taken a long time to find one.

As usual, macros are a big source of difficulty. In particular, if you
look at a piece of code, how do you know that a reference to an
unexported variable was placed there by a macro with proper access to
the variable?

Even preventing direct access to "private" exports is surprisingly
difficult. One obvious idea is to have a module identify which other
modules should be allowed to use its exports. This strategy doesn't
work if everyone can install a new module-name resolver, though;
someone could install a resolver to map an arbitrary module to the one
of the modules that is granted access, and thus gain access to the
private exports.

Solution Overview
-----------------

To address these problems, we have devised a two-level strategy:

 - Inspectors guard access to unexported and "protected" identifiers.

   In particular, the current code inspector (as determined by the
   `current-code-inspector' parameter) is associated with each new
   module declaration. When the module is instantiated for a `require',
   a sub-inspector is generated to guard access to the module's
   bindings. [So there are two inspectors per module, and the first is
   always a superior of the second. But there's an operation to change
   the second one, as explained later, so we need to distinguish them.]

   To load compiled code that refers to an unexported variable `x' in
   module `m', the current code inspector must be more senior than the
   instantiation-time inspector of `m'. Similarly, `module->namespace'
   can only be used when the current code inspector is more senior. [If
   you never change the code inspector, this is the same as the current
   behavior.]

   A module can export an identifier as "protected" by using the new
   `protect' sub-form of provide. For example,

      (provide (protect x))

   exports `x' as protected. Much like an unexported identifier, a
   protected identifier can only be referenced using an inspector that
   is more senior than the module's instantiation-time inspector.

   For example, `(lib "kernel.ss" "mred" private")' exports everything
   with `protect', so that access can be sealed off after `(lib
   "mred.ss" "mred")' is loaded.

 - Macro expansions are tracked through "certificates" to validate
   uses of unexported and protected identifiers.

   When a use of a macro `s' from module `m' is expanded, the expansion
   result gets a certificate that is based on the declaration-time
   inspector for `m'. As long as the certificate remains intact, any
   references to unexported identifiers from `m' will be allowed in the
   expansion, because the certificate's inspector is more senior than
   the instantiation-time inspector of `m'.

   For example, when
     (send obj m 5)
   expands to
     (let ([this obj])
       ((find-method this 'm) this 5))
   a certificate on the result allows the access to `find-method'. The
   certificate is attached to the the expression as a whole, so if
   someone extracts just
       (find-method this 'm)
   and tries to evaluate it, the certificate is lost, and the reference
   to `find-method' is blocked by the expander/compiler.

   With rare exceptions, certificate tracking is transparent to macro
   implementors. Meanwhile, a macro will get caught if it uses
   `local-expand' and rearranges the result of the expansion in a bad
   way (as above). [When a macro expands to `define' and `begin', the
   macro implementor usually wants to allow destructing for
   internal-definition contexts. More on this below.]

The first level in the above strategy is much like security through
Java's class loaders. The second level is obviously specific to
expressive macros.

Changing Code Inspectors
------------------------

If you don't change `current-code-inspector', the new machinery treats
`protect' exports just like any other export, and it does not restrict
the loading of compiled code. It does prohibit extracting unexported
identifiers from macro expansions, but this prohibition merely prevents
accidents; a motivated intruder can still get to internal bindings
with, say, `module->namespace'.

If you change the current code inspector, in contrast, existing module
bindings become secure. A `protect' export becomes no more accessible
than an unexported binding, and an unexported binding is accessible
only through expansions of macros that are defined in the now-secure
modules.

Changing the code inspector also means that marshaled bytecode (e.g.,
read from a file) cannot access unexported variables of the secure
modules. Unfortunately, I see no way to validate unmarshaled bytecode,
except to recompile it. If the process that changed the code inspector
trusts some marshaled bytecode, however, it can revert the code
inspector while unmarshaling.

Securing an Environment
-----------------------

Different levels of security are defined by different sets of loads,
with the current code inspector changed in between.

MrEd doesn't change the code inspector on startup, so after starting
MrEd, you can still get to exports from `(lib "kernel.ss" "mred"
private")':

 > (require (lib "kernel.ss" "mred" "private"))
 > in-atomic-region
 #<primitive:in-atomic-region>

But if you change the code inspector in MrEd, then the export becomes
inaccessible, because it's exported with `protect':

 > (current-code-inspector (make-inspector))
 > in-atomic-region
 compile: access from an uncertified context to protected variable from
 module: |,/home/mflatt/proj/plt/collects/mred/private/kernel| at:
 in-atomic-region in: kernel:in-atomic-region

DrScheme, for example, should change the code inspector before running
programs in the definitions window. Otherwise, you can break DrScheme
by accessing `(lib "kernel.ss" "mred" "private")' --- or worse --- in
the definitions window.

When to Protect Exports
-----------------------

Roughly, use `protect' like an `unsafe' declaration. All of the exports
of `(lib "foreign.ss")' should be protected, for example.

If a helper module exports an identifier whose use can break invariants
of the main module, the export should be protected. For example, the
`(lib "class-internal.ss" "mzlib" "private")' module mostly exports
things to be re-exported by `(lib "class.ss")', but it also exports
some functions for use only by `(lib "contract.ss")'; the latter set is
protected.

Don't protect an identifier that is "safe" and that will be re-exported
by a public module. The protection mode of an identifier is determined
by the defining module, and cannot be changed by re-exporting modules.
That's why most of the exports from `(lib "class-internal.ss" "mzlib"
"private")' are unprotected.

Attaching and Unprotecting Modules
----------------------------------

In previous versions, when a module was attached to a target namespace
with `namespace-attach-module', then `module->namespace' would not work
for the module in the target namespace, and the module could not be
re-declared. This was a poor man's version of the protection that we
now have through code inspectors. As of v299.13, attaching a module
does not prevent it from being redeclared in the target namespace or
accessed via `module->namespace'. To get the old effect, change the
code inspector.

In some cases, you may want to protect exports of certain modules but
not others. The new `namespace-unprotect-module' function takes a
module name and an inspector; if the given inspector is more senior
than the module's current instance inspector, the module's instance
inspector is changed to a fresh sub-inspector of the current code
inspector. The instance inspector, furthermore, is namespace-specific.
So you can use `namespace-attach-module' followed by
`namespace-unprotect-module' to allow access to a particular module's
protected exports in a particular namespace.

DrScheme, for example, will need to unprotect `(lib "foriegn.ss")' in a
suitable language level (assuming that DrScheme also changes the code
inspector). Otherwise, no one would be able to implement foreign-based
modules in DrScheme.

Expansion Certificates and `local-expand'
-----------------------------------------

As explained above, when a macro is expanded, the result is effectively
certified as originating from the macro. Any certificates that were on
the original macro call are also propagated to the result.

Suppose, for example, that `(s)' expands to `(class object% ... foo
...)' where `foo' is not exported from the module that defines `s'. 

 - So far, the access of `foo' is allowed, because the result is
   certified.

 - The `class' macro further expands to something large with references
   to private functions of the class system. Those references are also
   allowed, because the `class' expansion result is certified as being
   from `class'.

 - The large result from `class' also contains an embedded reference to
   `foo'. It's still ok, because the `class' expansion result inherits
   the certification from the `(s)' expansion. [The idea is that `s'
   expands to a use of `class', so it must trust the `class' macro.]

But what if `(s)' expands to 
  (class object% ... (define-use-foo id) ...)
where `(define-use-foo id)' expands to `(define (id) foo)'? In this
case, `class' will `local-expand' the `(define-use-foo)' to get
`(define (id) foo)'. It will then pull apart the definition, extracting
the use of `foo' from the definition.

This extraction would normally lose the certificate indicating that the
`foo' reference came from `define-use-foo'. In practice, this would be
a disaster, because many macros expand to `begin' and `define' forms
where `class' and other internal-definition forms should be able to
pull them apart.

Our solution is to treat `begin', `define-values', and
`define-syntaxes' specially in the result of a macro expansion. When
the result starts with one of these identifiers (based on
`module-identifier=?' with the `mzscheme' export), the expansion
certificate is attached to the elements inside the outer parens,
instead of the expression as a whole. This special handling is applied
recursively, in case a macro produces a `begin' form that contains
definitions, etc.

Note that our example above uses `define' instead of `define-values',
and `define' is not handled specially. So, initally, then the expansion
certificate for `(s)' is attached to the entire result `(define (id)
foo)'. But `define' expands to a use of `define-values', and at that
point the new and inherited certificates get propagated to sub-parts.

The special handing of `begin', `define-values', and `define-syntaxes'
sounds awkward, but it seems to work perfectly in practice.

In case it ever fails to work perfectly, a macro implementor can
override the default placement of certificates by attaching a a
'certify-mode property to the result syntax object. I haven't found a
real use of this so far, but by adding `(public id)' to the above
example, we get a contrived example:

  (module m mzscheme
    (define foo 11)
    (define-syntax (define-use-foo stx)
      (syntax-case stx ()
        [(_ id) #'(define (id) foo)]))  
    (provide define-use-foo))

  (require (lib "class.ss") m)

  (class object%
    (public id) ;;; Make id a method instead of a private field
    (define-use-foo id))

The problem, in this case, is that `class' further fulls apart the
right-hand side of a method to add an implicit `this' to the procedure.
But the certificate allowing the `foo' reference ends up attached to
the procedure that results from the `define' expansion, so it's lost
when the procedure is reconstructed with an extra argument.

The problem can be fixed by forcing the certificate further down in the
original `define':

    (define-syntax (define-use-foo stx)
      (syntax-case stx ()
        [(_ id) (syntax-property
                 #'(define (id) foo)
                 'certify-mode
                 'transparent)]))

This is uncomfortably subtle, but I expect this kind of case to be
extremely rare. (My first several attempts to construct the example
didn't work, because the heuristic is so good.)

Program-Processing Tools
------------------------

Program-processing tools, such as Errortrace, pull apart macro
expansions and assemble them in a different way. These
program-processing programs break in v299.13, because certificates get
lost by re-assembly.

In v299.13, the only reason that Errortrace should be able to
re-arrange your programs is that you trust it, and you've expressed
that trust by setting up Errortrace with a powerful inspector.

MzScheme, meanwhile, doesn't trust Errortrace, and so Errrotrace should
not be able to rearrange the expansion of, say, `with-handlers'.
Errortrace should be able to rearrange subexpressions from your code
that were originally inside a `with-handlers' use, though. Similarly,
if `(lib "class.ss")' is loaded with an inspector that is weaker than
Errrotrace's, then Errortrace should have no control over the code
generated by `class', but it might control sub-expressions within
methods.

This part is not complete in v299.13. For now, Errortrace (or anyone
else) can use `syntax-recertify' to transfer an arbitrary certificate
from an old expression to one that it constructs.

In v299.14, `syntax-recertify' will require more proof that the
re-certification is ok. The new procedures
`syntax-extend-certify-context' and `syntax-recerity-constrained?' are
part of the plan, but they don't yet work. I'm pretty sure that they
can be made to work, but I can't be completely sure until I see the
full implementation --- more details then.

Matthew
Posted on the users mailing list.
Previous message: [plt-scheme] 299.13
Next message: [plt-scheme] module security, part 2
Messages sorted by: [date] [thread] [subject] [author]