[racket-dev] [racket] tests/eli-tester feedback (Was: Racket unit testing)
On 02/18/2011 07:30 AM, Eli Barzilay wrote:
> 50 minutes ago, Ryan Culpepper wrote:
>> On 02/15/2011 07:28 AM, Eli Barzilay wrote:
>>> And finaly, there's the litmus test for existing code:
>>>
>>> * Ryan: is something like this enough to implement the GUI layer?
>>
>> Not well, I think. The Test-Result type in Noel's racktest code is
>> too simple and inflexible. It represents the minimal essence of
>> testing, but it would be awkward to extend to richer testing
>> sytems. Here's my counter-proposal for representing the results of
>> tests:
>> [...]
>
> I can't make sense of it, besides a vague "waaaay to heavy" feeling
> for something that should be core-ishly minimalistic.
Simplicity is no good if it gets in the way of representing information
that needs to be represented.
> In an attempt to follow it, I did this:
>
> TestResult = header
> execution
> status
>
> but your TestHeader is used only there,
Not necessarily. A testing framework that distinguishes test
construction from test creation might create the header when the test is
constructed. SchemeUnit used to work that way, and RackUnit is able to,
although less gracefully than before.
(See also my final remark, about "test started" notifications.)
> so it could be folded in:
>
> TestResult = name (U String #f)
> suite (Listof String)
> info Dictionary
> execution
> status
>
> TestExecution is also used only once so it can also be folded in --
> but since it's just a generic dictionary, it can be dropped.
I think it's a bad idea to collapse the two dictionaries, because they
represent different information. Especially since the set of keys is
open-ended, it is helpful to separate information about the test from
information about its execution.
> TestResult = name (U String #f)
> suite (Listof String)
> info Dictionary
> status
>
> Now, status is one of three options, the failure one has a dictionary
> so it can be removed (folded into the above).
I object again to the conflation of unrelated dictionaries.
> So overall, it looks like a simple struct with a name, a "suite" (kind
> of, defined indirectly by a hierarchy of string lists), a generic
> dictionary for "stuff", and a status. This is all modulo some
> questions/issues that are unclear to me:
>
> * It looks like it tries to break away too many pieces into a formal
> description. For example, it looks like an overkill to have fields
> with the actual value and expected value and worse -- the
> comparison. For example, What happens when the comparison is
> parameterized (eg, "close within dx to some number")?
Not every testing framework might use 'actual and 'expected, and even
frameworks that do might not use them all the time. For example, in
rackunit:
(check-equal? 'apple 'orange)
would result in the following failure attributes:
'actual => 'apple
'expected => 'orange
'comparison => 'equal?
(I think rackunit currently reports the check name, 'check-equal?,
rather than the comparison name. It could work either way, or maybe it
could include both.)
On the other hand, something like check-not-false might only report an
actual value. And a check parameterized over a tolerance could just
include the tolerance as an extra attribute.
> * What happens when there's no specific expected value to compare?
> For example, run some two pieces of code 10 times each and check
> that the average runtime of the first is below the runtime of the
> second. This could be phrased in terms of an expected value, but in
> a superficial way, and will prevent useful information from being
> expressed (since the information would have to be reduced to two
> numbers).
You can include whatever information you want. That's why it's a
dictionary, rather than a fixed set of fields. The real question is how
a test result displayer will know how to interpret the fields correctly.
I think a useful default is to show all attributes with keys that are
interned symbols or strings. Custom attributes would only work for test
result displayers that know about them.
> * What if you don't want to hold on to the value? (For example, free
> some related resource.)
Then convert it to a string and keep the string. (This is what the
rackunit gui does to report custodian-managed leftovers.)
> * This solidifies the list-of-strings as a representation of the test
> hierarchy. But perhaps there is no way to avoid this -- if it's
> made into a proper hierarchy of objects it will probably complicate
> things in a way that requires the listener to get "update" events
> that tells it how the structure changed.
I was actually going to propose something more complicated for the
hierarchy, but I figured it was better to leave that for later. I'm
certainly open to changing this part.
> * I'm not sure about the error result. It seems to me that this is a
> meta issue that you're dealing with when you develop the test suite,
> and as such it should be something that you'd deal with in the usual
> ways => throw an exception. It's the tools that should be in charge
> of catching such an exception and deal with it -- which means that
> - in my tester's case, it'll defer to racket as usual, meaning that
> you'd just get an error.
> - in rackunit's case you'd probably get some report listing the
> erroneous tests, instead of propagating the error.
> - and in your gui case you'd catch exceptions and show them as error
> results.
Are you saying you think a status should only be success or failure? If
so, I disagree. I can see roughly how that would work, but I think it's
useful to distinguish between failure and error at the reporting level.
But if you mean that a testing framework should be allowed to halt on
errors instead of reporting them as error statuses, then I agree.
> Also:
>
>> And that's not quite the end of it. The rackunit gui creates an
>> entry for a test case as soon as it starts running, so the user can
>> see what test case is hanging and interrupt it if they choose. That
>> requires additional communication between test execution and test
>> display.
>
> Yes, that would e part of the protocol for the listener -- and it
> makes sense to allow tests to invoke it to let it know that a test has
> started.
Like maybe sending it just the test-header struct? The part that
represents the information known about the test before it executes,
packaged up as one value?
Although, if we're going to standardize this part it would also be nice
to have a way of indicating that a suite has started, too.
Ryan