[racket-dev] [racket] tests/eli-tester feedback (Was: Racket unit testing)

From: Ryan Culpepper (ryanc at ccs.neu.edu)
Date: Fri Feb 18 15:02:02 EST 2011

On 02/18/2011 07:30 AM, Eli Barzilay wrote:
> 50 minutes ago, Ryan Culpepper wrote:
>> On 02/15/2011 07:28 AM, Eli Barzilay wrote:
>>> And finaly, there's the litmus test for existing code:
>>>
>>> * Ryan: is something like this enough to implement the GUI layer?
>>
>> Not well, I think. The Test-Result type in Noel's racktest code is
>> too simple and inflexible. It represents the minimal essence of
>> testing, but it would be awkward to extend to richer testing
>> sytems. Here's my counter-proposal for representing the results of
>> tests:
>> [...]
>
> I can't make sense of it, besides a vague "waaaay to heavy" feeling
> for something that should be core-ishly minimalistic.

Simplicity is no good if it gets in the way of representing information 
that needs to be represented.

> In an attempt to follow it, I did this:
>
>    TestResult = header
>                 execution
>                 status
>
> but your TestHeader is used only there,

Not necessarily. A testing framework that distinguishes test 
construction from test creation might create the header when the test is 
constructed. SchemeUnit used to work that way, and RackUnit is able to, 
although less gracefully than before.

(See also my final remark, about "test started" notifications.)

 > so it could be folded in:
>
>    TestResult = name      (U String #f)
>                 suite     (Listof String)
>                 info      Dictionary
>                 execution
>                 status
>
> TestExecution is also used only once so it can also be folded in --
> but since it's just a generic dictionary, it can be dropped.

I think it's a bad idea to collapse the two dictionaries, because they 
represent different information. Especially since the set of keys is 
open-ended, it is helpful to separate information about the test from 
information about its execution.

>    TestResult = name      (U String #f)
>                 suite     (Listof String)
>                 info      Dictionary
>                 status
>
> Now, status is one of three options, the failure one has a dictionary
> so it can be removed (folded into the above).

I object again to the conflation of unrelated dictionaries.

> So overall, it looks like a simple struct with a name, a "suite" (kind
> of, defined indirectly by a hierarchy of string lists), a generic
> dictionary for "stuff", and a status.  This is all modulo some
> questions/issues that are unclear to me:
>
> * It looks like it tries to break away too many pieces into a formal
>    description.  For example, it looks like an overkill to have fields
>    with the actual value and expected value and worse -- the
>    comparison.  For example, What happens when the comparison is
>    parameterized (eg, "close within dx to some number")?

Not every testing framework might use 'actual and 'expected, and even 
frameworks that do might not use them all the time. For example, in 
rackunit:

   (check-equal? 'apple 'orange)

would result in the following failure attributes:

   'actual => 'apple
   'expected => 'orange
   'comparison => 'equal?

(I think rackunit currently reports the check name, 'check-equal?, 
rather than the comparison name. It could work either way, or maybe it 
could include both.)

On the other hand, something like check-not-false might only report an 
actual value. And a check parameterized over a tolerance could just 
include the tolerance as an extra attribute.

> * What happens when there's no specific expected value to compare?
>    For example, run some two pieces of code 10 times each and check
>    that the average runtime of the first is below the runtime of the
>    second.  This could be phrased in terms of an expected value, but in
>    a superficial way, and will prevent useful information from being
>    expressed (since the information would have to be reduced to two
>    numbers).

You can include whatever information you want. That's why it's a 
dictionary, rather than a fixed set of fields. The real question is how 
a test result displayer will know how to interpret the fields correctly. 
I think a useful default is to show all attributes with keys that are 
interned symbols or strings. Custom attributes would only work for test 
result displayers that know about them.

> * What if you don't want to hold on to the value?  (For example, free
>    some related resource.)

Then convert it to a string and keep the string. (This is what the 
rackunit gui does to report custodian-managed leftovers.)

> * This solidifies the list-of-strings as a representation of the test
>    hierarchy.  But perhaps there is no way to avoid this -- if it's
>    made into a proper hierarchy of objects it will probably complicate
>    things in a way that requires the listener to get "update" events
>    that tells it how the structure changed.

I was actually going to propose something more complicated for the 
hierarchy, but I figured it was better to leave that for later. I'm 
certainly open to changing this part.

> * I'm not sure about the error result.  It seems to me that this is a
>    meta issue that you're dealing with when you develop the test suite,
>    and as such it should be something that you'd deal with in the usual
>    ways =>  throw an exception.  It's the tools that should be in charge
>    of catching such an exception and deal with it -- which means that
>    - in my tester's case, it'll defer to racket as usual, meaning that
>      you'd just get an error.
>    - in rackunit's case you'd probably get some report listing the
>      erroneous tests, instead of propagating the error.
>    - and in your gui case you'd catch exceptions and show them as error
>      results.

Are you saying you think a status should only be success or failure? If 
so, I disagree. I can see roughly how that would work, but I think it's 
useful to distinguish between failure and error at the reporting level.

But if you mean that a testing framework should be allowed to halt on 
errors instead of reporting them as error statuses, then I agree.

> Also:
>
>> And that's not quite the end of it. The rackunit gui creates an
>> entry for a test case as soon as it starts running, so the user can
>> see what test case is hanging and interrupt it if they choose. That
>> requires additional communication between test execution and test
>> display.
>
> Yes, that would e part of the protocol for the listener -- and it
> makes sense to allow tests to invoke it to let it know that a test has
> started.

Like maybe sending it just the test-header struct? The part that 
represents the information known about the test before it executes, 
packaged up as one value?

Although, if we're going to standardize this part it would also be nice 
to have a way of indicating that a suite has started, too.

Ryan


Posted on the dev mailing list.