[racket] get-uncovered-expressions

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Oct 5 22:23:08 EDT 2010

On July 22nd, Nadeem Abdul Hamid wrote:
> Apparently the coverage thing only works if the evaluator is given a
> (byte) string (as opposed to an input port or anything else), so using
> port->bytes when reading from a file produces the expected result:
> 
>  (define Ev
>   (call-with-input-file* "test.rkt"
>     (λ(inp)
>       (parameterize ([sandbox-coverage-enabled #t])
>         (make-evaluator 'racket
>                         (port->bytes inp)
>                         )))))
>  (get-uncovered-expressions Ev)

Apologies for the late reply -- I originally thought that there was a
bug there, and now when I got to it I see that there isn't a proper
bug -- but there's a change that will make things more convenient and
less confusing.  Please read on if you're using the sandbox, to make
sure that this won't break code (it should make things easier).

So here's the short version of the issue around coverage information
from the sandbox.  The `get-uncovered-expressions' filters the result
that it gives you according to the syntax source information.  The
syntax source (the result of `syntax-source') is an arbitrary value
that indicates where the syntax came from -- it's usually a string or
a path for the source code.  In the sandbox case, it uses
`read-syntax' over the given path -- if given a path -- so that
produces a similar syntax source.  But if you give it a quoted sexpr,
or a string, or a byte string, then there is no way for it to know the
source, and it makes one up.  The one that it makes up is 'program.

Now, getting to `get-uncovered-expressions' -- it has three arguments:
the evaluator, a boolean flag indicating whether you want the
uncovered expression after the module was first evaluated, or
including any following interactions, and the last one is some value
which is used to filter the results -- it will leave in only syntaxes
with that source.

This last argument is the problematic one in this case.  The default
value for it is 'program -- which makes it work well *if* you
originally used some source-less data for the code (a string, in your
case).  But if you give it a path, then that path will get used as the
source, and if you filter out expressions that don't have 'program as
their source you're left with ... nothing (which is often surprising).

So the change that I'm thinking of is this:

* Each sandbox will have its own default source to filter on, which
  will be used as the default third argument to
  `get-uncovered-expressions'.

* When a sandbox is constructed, it will *try* to decide what this
  default value should be.  If it's given a string or a sexpr, it will
  use 'program; and if it's given a path, it will use it.

* This will work fine for strings and for paths, and will *usually*
  work for sexprs too.  The problem with sexprs, is that they might
  contain some syntax values in them, and those will keep their own
  source instead of getting 'program -- and as a result they will be
  filtered out.

  This is probably a much less common case, so the question is how
  confusing it can be if people do run into that case.  I think that
  the overall benefit is better, and that rare case is already
  confusing in the same way.

There are two other ways to improve this (but note that so far the
above seems to me like the best):

A. A much simpler change -- just use #f as the default value.  This
   means that you get the unfiltered results (by default).  The
   problem with that is expressions from other files that get in via
   macros.  These expressions get annotated too, but they're almost
   always useless for you, since you usually don't care about how a
   macro was implemented.  I dislike this since I think it will lead
   to more confusion, when you keep getting uncovered expressions from
   random racket libraries.  (E.g., `match' can be very amuzing, since
   it creates a *lot* of hairy expressions, and you'll see all of that
   in the output.)

B. Another option is to scan the input syntax (that is, after the
   string is read, or the sexpr is converted to syntax), and filter
   out anything that is not in that syntax.  This is very precise in
   the sense that you never get expressions that were not in the
   original code -- but the problem is that it's too conservative.
   Specifically, some (foo) macro can expand to some different (bar)
   expression with the same source, and executing or not the (bar)
   should usually mark the "(foo)" in the source as touched or not.
   This makes the results very broken, as you'll see below.

*** Here's a concrete example:

    ----< some-library.rkt >----
    #lang racket/base
    (provide foo)
    (define-syntax-rule (foo) (bar))
    (define (bar) (+ 1 2))

    ----< sandboxed-code.rkt >-----
    #lang racket
    (require "some-library.rkt")
    (and #f (foo))

running the second file through the sandbox, and getting the
unfiltered list of uncovered expressions shows:

    '(#<syntax:/tmp/sandboxed-code.rkt:3:8 (#%app bar)>
      #<syntax:/tmp/some-library.rkt:3:27 bar>
      #<syntax:/tmp/some-library.rkt:4:14 (#%app + (quote 1) (quote 2))>
      #<syntax:/tmp/some-library.rkt:4:15 +>
      #<syntax:/tmp/some-library.rkt:4:17 (quote 1)>
      #<syntax:/tmp/some-library.rkt:4:19 (quote 2)>)

You can see that unfiltered results are not great: they contain pieces
from the macro, which the sandboxed code should not care about.  You
can also see that the (foo) expression is not in there -- the
expressions that are collected are from the expanded code, so (foo)
disappeared, and doing the precise filtering will leave nothing
visible.  (And it should be clear now why this would be completely
broken: every function application in racket is a macro, which means
that all applications in the original code are never seen as
uncovered.)

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.