[plt-scheme] redex: leveraging generated checks and coverage

From: Casey Klein (clklein at eecs.northwestern.edu)
Date: Thu Sep 17 19:23:46 EDT 2009

On Thu, Sep 17, 2009 at 4:09 PM, Eric Tanter <etanter at dcc.uchile.cl> wrote:
> Thanks a lot Casey, this is all very useful. Some more questions/comments:
>>> * What kind of interesting properties can be checked with
>>> check-reduction-relation?
>> check-reduction-relation is most useful for testing that reduction
>> preserves some property. For example, you might use it to test the
>> following: if an expression e has no free variables, and e reduces to
>> an expression e', then e' has no free variables.
>> check-reduction-relation often works better than redex-check when some
>> of your relation's cases are unlikely to apply to a random term. For
>> example, (redex-check L e (stays-closed? (term e))) probably won't
>> generate many tests that exercise a call-by-value beta rule, because
>> matching its left-hand side E[((λ (x ..._1) e) v ..._1)] requires an
>> expression in which
>> 1. there is an application in the evaluation context's hole,
>> 2. the application's function position is a lambda,
>> 3. the application supplies the correct number of arguments, and
>> 4. all the arguments are already values.
>> (check-reduction-relation R stays-closed?) will get better coverage
>> because it extracts the beta rule's left-hand side from the definition
>> of stays-closed? and does something like this
>> (redex-check L (name e (in-hole E ((λ (x ..._1) e) v ..._1)))
> [not sure what that 'name' here is]

`name' is just a way to bind a name to a piece of a term so that you
can refer to it in your test expression.


Written without a `name' pattern, this example becomes

 L (in-hole E ((λ (x ..._1) e) v ..._1))
 (stays-closed? (term (in-hole E ((λ (x ..._1) e) v ..._1)))))

`name' can also be used in defining metafunctions and reduction
relationsto give a name to a piece of the left-hand side for use in
the right-hand side.

(define-language L
  (n number))

(define R
   (--> ((name q (n_0 n_1)) n_2)

> (apply-reduction-relation R (term ((4 5) 6)))
((4 5))

>> (stays-closed? (term e)))
> ok, so is it right then that stays-closed? must internally call
> apply-reduction-relation,


> or does check-reduction-relation work as follows:
> 1. generate a term (matching the left handside of a rule)
> 2. take a reduction step
> 3. apply the property test
> ?

The primary difference between check-reduction-relation and
redex-check (apart from the interface) is that
check-reduction-relation supplies Redex with the hint that the tests
cases it generates should match the relation's left-hand sides.

>>> for instance, I would imagine useful to check that all well-formed
>>> program
>>> reduces to either a value, or a well-defined error. Ie. something like:
>>> (check-reduction-relation red
>>> (λ (t)
>>>  (redex-match lang ((store (l_1 v_1) ...) v_2)
>>>              (apply-reduction-relation* red t)))
>> For this property you're probably better off with redex-check.
>> check-reduction-relation generates tests from the relation's left-hand
>> sides. This guarantees that every program it tries takes at least one
>> step, which probably isn't what you want here, because it guarantees
>> you won't detect programs that are stuck from the get-go.
>>> Except that means I have to write rules each and every strange program
>>> that
>>> can be produced.
>>> Still, is that good practice? or is that over-specification?
>> If you want to test this property, I see two options.
>> 1. Reduce these strange programs to an explicit state that represents
>> unspecified behavior. This is what the R6RS formal semantics does.
>> 2. Write your predicate so that it detects strange programs and lets
>> them off the hook without testing that they produce values.
> That's exactly what I did in the meantime, and it seems to work.
> I have a reduction relation that reduces mal-formed terms (eg. term where
> the locations in the store are not unique) to bad terms, and I have a
> bad-term predicate.
> Then I check as follows:
> ;; deterministic
> (check-reduction-relation
>  red
>  (λ (t) (let ((res (apply-reduction-relation* red t)))
>          (or (eq? 1 (length res))
>              (bad-term res))))
> (I do have to limit the number of attempts otherwise I blow memory...)

The space is probably going to the caches Redex uses to speed up
matching and metafunction application. Addressing this problem is on
my TODO list.

>>> * I was surprised that using a property that does not call reduction,
>>> check-reduction-relation was able to determine some ambiguity in a
>>> metafunction (same clause gave possibly different results given the same
>>> term)
>>> a) how did it get to the metafunction? (my property function was just a
>>> print function)
>> My guess is that one of the relation's cases is guarded by an
>> ambiguous metafunction.
>> [...]
>>> b) what kind of testing does it do exactly? (to determine the above, it
>>> must
>>> have run it several times with the same term, at least)
>> This test doesn't actually have anything to do with what
>> check-reduction-relation tests. Redex always checks that your
>> metafunctions are actually (partial) functions.
> Well, prior to playing with check-reduction-relation, I built my test suites
> and so on, and redex never complained about these ambiguities. So what do
> you mean by "Redex always checks..."? when is that check actually done?

Every time you apply a metafunction, Redex checks that the application
does not produce multiple results.

(define-language M
 (n number))

(define-metafunction M
 [(not-a-func n_0 ... n_i n_i+1 ...) n_i])

> (term (not-a-func 0 1 2))
not-a-func: clause #0 (counting from 0) matched (not-a-func 0 1 2) 3
different ways and returned different results

The term (not-a-func 0 1 2) matches the pattern (not-a-func n_0 ...
n_i n_i+1 ...) three different ways (one for each choice of n_i), and
not-a-func produces three different results.

Your hand-crafted test suite must not have tried a case where the
metafunction in question produced multiple results.

>>> * At times, I get:
>>> check-reduction-relation: unable to generate pattern <pattern-here> in
>>> <x>
>>> attempts
>>> what am I supposed to learn from that? (from a language design point of
>>> view)
>> My guess is that you have a side-condition somewhere that Redex is
>> having trouble satisfying randomly.
>> [..]
> right, that may definitely be the case
>> (define-language M
>>  (q (side-condition number_1 (= (term number_1) 12345431))))
>>> (redex-check M q #t)
>> redex-check: unable to generate pattern number_1 in 100 attempts
>> This error message should really include the side-condition part of
>> the pattern since it's that part, not number_1, that's actually
>> causing the difficulty. I'll fix that.
> That'd be better indeed.
>> I'm looking into some approaches that will make Redex's test case
>> generation better at satisfying arbitrary side-conditions, but that
>> will take some work.
>> For the time being, maybe this message should be a warning instead of
>> an error. What do you think, Robby?
> maybe useful to have a silent/warning mode.
>>> * Finally, I really enjoy the coverage report for a reduction relation.
>>> There does not seem to be a similar functionality for metafunctions,
>>> though.
>>> It would be nice if I could get confidence that my test cases, in
>>> addition
>>> to testing all reduction rules, actually cover each possible path in the
>>> metafunctions (ie. telling me, for each metafunction, how many times each
>>> clause has matched).
>> I'll add a metafunction-coverage parameter that works like
>> relation-coverage.
> Cool! I look forward to that!
> Cheers,
> -- Éric

Posted on the users mailing list.