[plt-scheme] redex: leveraging generated checks and coverage
On Thu, Sep 17, 2009 at 4:09 PM, Eric Tanter <etanter at dcc.uchile.cl> wrote:
> Thanks a lot Casey, this is all very useful. Some more questions/comments:
>
>>> * What kind of interesting properties can be checked with
>>> check-reduction-relation?
>>
>> check-reduction-relation is most useful for testing that reduction
>> preserves some property. For example, you might use it to test the
>> following: if an expression e has no free variables, and e reduces to
>> an expression e', then e' has no free variables.
>>
>> check-reduction-relation often works better than redex-check when some
>> of your relation's cases are unlikely to apply to a random term. For
>> example, (redex-check L e (stays-closed? (term e))) probably won't
>> generate many tests that exercise a call-by-value beta rule, because
>> matching its left-hand side E[((λ (x ..._1) e) v ..._1)] requires an
>> expression in which
>>
>> 1. there is an application in the evaluation context's hole,
>> 2. the application's function position is a lambda,
>> 3. the application supplies the correct number of arguments, and
>> 4. all the arguments are already values.
>>
>> (check-reduction-relation R stays-closed?) will get better coverage
>> because it extracts the beta rule's left-hand side from the definition
>> of stays-closed? and does something like this
>>
>> (redex-check L (name e (in-hole E ((λ (x ..._1) e) v ..._1)))
>
> [not sure what that 'name' here is]
`name' is just a way to bind a name to a piece of a term so that you
can refer to it in your test expression.
http://docs.plt-scheme.org/redex/Patterns.html#%28tech._name%29
Written without a `name' pattern, this example becomes
(redex-check
L (in-hole E ((λ (x ..._1) e) v ..._1))
(stays-closed? (term (in-hole E ((λ (x ..._1) e) v ..._1)))))
`name' can also be used in defining metafunctions and reduction
relationsto give a name to a piece of the left-hand side for use in
the right-hand side.
(define-language L
(n number))
(define R
(reduction-relation
L
(--> ((name q (n_0 n_1)) n_2)
q)))
> (apply-reduction-relation R (term ((4 5) 6)))
((4 5))
>> (stays-closed? (term e)))
>
> ok, so is it right then that stays-closed? must internally call
> apply-reduction-relation,
Yes.
> or does check-reduction-relation work as follows:
> 1. generate a term (matching the left handside of a rule)
> 2. take a reduction step
> 3. apply the property test
>
> ?
The primary difference between check-reduction-relation and
redex-check (apart from the interface) is that
check-reduction-relation supplies Redex with the hint that the tests
cases it generates should match the relation's left-hand sides.
>>> for instance, I would imagine useful to check that all well-formed
>>> program
>>> reduces to either a value, or a well-defined error. Ie. something like:
>>>
>>> (check-reduction-relation red
>>> (λ (t)
>>> (redex-match lang ((store (l_1 v_1) ...) v_2)
>>> (apply-reduction-relation* red t)))
>>>
>>
>> For this property you're probably better off with redex-check.
>> check-reduction-relation generates tests from the relation's left-hand
>> sides. This guarantees that every program it tries takes at least one
>> step, which probably isn't what you want here, because it guarantees
>> you won't detect programs that are stuck from the get-go.
>>
>>> Except that means I have to write rules each and every strange program
>>> that
>>> can be produced.
>>> Still, is that good practice? or is that over-specification?
>>
>> If you want to test this property, I see two options.
>>
>> 1. Reduce these strange programs to an explicit state that represents
>> unspecified behavior. This is what the R6RS formal semantics does.
>>
>> 2. Write your predicate so that it detects strange programs and lets
>> them off the hook without testing that they produce values.
>
> That's exactly what I did in the meantime, and it seems to work.
>
> I have a reduction relation that reduces mal-formed terms (eg. term where
> the locations in the store are not unique) to bad terms, and I have a
> bad-term predicate.
>
> Then I check as follows:
>
> ;; deterministic
> (check-reduction-relation
> red
> (λ (t) (let ((res (apply-reduction-relation* red t)))
> (or (eq? 1 (length res))
> (bad-term res))))
>
> (I do have to limit the number of attempts otherwise I blow memory...)
>
The space is probably going to the caches Redex uses to speed up
matching and metafunction application. Addressing this problem is on
my TODO list.
>>> * I was surprised that using a property that does not call reduction,
>>> check-reduction-relation was able to determine some ambiguity in a
>>> metafunction (same clause gave possibly different results given the same
>>> term)
>>> a) how did it get to the metafunction? (my property function was just a
>>> print function)
>>
>> My guess is that one of the relation's cases is guarded by an
>> ambiguous metafunction.
>>
>> [...]
>>>
>>> b) what kind of testing does it do exactly? (to determine the above, it
>>> must
>>> have run it several times with the same term, at least)
>>
>> This test doesn't actually have anything to do with what
>> check-reduction-relation tests. Redex always checks that your
>> metafunctions are actually (partial) functions.
>
> Well, prior to playing with check-reduction-relation, I built my test suites
> and so on, and redex never complained about these ambiguities. So what do
> you mean by "Redex always checks..."? when is that check actually done?
Every time you apply a metafunction, Redex checks that the application
does not produce multiple results.
(define-language M
(n number))
(define-metafunction M
[(not-a-func n_0 ... n_i n_i+1 ...) n_i])
> (term (not-a-func 0 1 2))
not-a-func: clause #0 (counting from 0) matched (not-a-func 0 1 2) 3
different ways and returned different results
The term (not-a-func 0 1 2) matches the pattern (not-a-func n_0 ...
n_i n_i+1 ...) three different ways (one for each choice of n_i), and
not-a-func produces three different results.
Your hand-crafted test suite must not have tried a case where the
metafunction in question produced multiple results.
>>> * At times, I get:
>>> check-reduction-relation: unable to generate pattern <pattern-here> in
>>> <x>
>>> attempts
>>> what am I supposed to learn from that? (from a language design point of
>>> view)
>>
>> My guess is that you have a side-condition somewhere that Redex is
>> having trouble satisfying randomly.
>> [..]
>
> right, that may definitely be the case
>
>> (define-language M
>> (q (side-condition number_1 (= (term number_1) 12345431))))
>>
>>> (redex-check M q #t)
>>
>> redex-check: unable to generate pattern number_1 in 100 attempts
>>
>> This error message should really include the side-condition part of
>> the pattern since it's that part, not number_1, that's actually
>> causing the difficulty. I'll fix that.
>
> That'd be better indeed.
>
>> I'm looking into some approaches that will make Redex's test case
>> generation better at satisfying arbitrary side-conditions, but that
>> will take some work.
>>
>> For the time being, maybe this message should be a warning instead of
>> an error. What do you think, Robby?
>
> maybe useful to have a silent/warning mode.
>
>>> * Finally, I really enjoy the coverage report for a reduction relation.
>>> There does not seem to be a similar functionality for metafunctions,
>>> though.
>>> It would be nice if I could get confidence that my test cases, in
>>> addition
>>> to testing all reduction rules, actually cover each possible path in the
>>> metafunctions (ie. telling me, for each metafunction, how many times each
>>> clause has matched).
>>
>> I'll add a metafunction-coverage parameter that works like
>> relation-coverage.
>
> Cool! I look forward to that!
>
>
> Cheers,
>
> -- Éric
>