[racket-dev] [racket] tests/eli-tester feedback (Was: Racket unit testing)

From: Ryan Culpepper (ryanc at ccs.neu.edu)
Date: Sun Feb 20 19:08:26 EST 2011

On 02/18/2011 02:12 PM, Eli Barzilay wrote:
> 25 minutes ago, Ryan Culpepper wrote:
>> On 02/18/2011 07:30 AM, Eli Barzilay wrote:
>>> 50 minutes ago, Ryan Culpepper wrote:
>>>> On 02/15/2011 07:28 AM, Eli Barzilay wrote:
>>>>> And finaly, there's the litmus test for existing code:
>>>>>
>>>>> * Ryan: is something like this enough to implement the GUI layer?
>>>>
>>>> Not well, I think. The Test-Result type in Noel's racktest code is
>>>> too simple and inflexible. It represents the minimal essence of
>>>> testing, but it would be awkward to extend to richer testing
>>>> sytems. Here's my counter-proposal for representing the results of
>>>> tests:
>>>> [...]
>>>
>>> I can't make sense of it, besides a vague "waaaay to heavy" feeling
>>> for something that should be core-ishly minimalistic.
>>
>> Simplicity is no good if it gets in the way of representing information
>> that needs to be represented.
>
> [But the flip token is that complexity is no good if you end up with
> something that doesn't fit any system, where each one is filling in
> fields that it doesn't "want to".]

The representation I outlined is based on the needs of the rackunit gui, 
where test execution and display are currently tightly coupled. By that 
I mean that plain rackunit's notion of test results is insufficient for 
the gui; I had to create my own. I also generalized the idea of test 
headers based on a long-standing feature request (the ability to 
designate tests as expected to fail).

>>> In an attempt to follow it, I did this:
>>>
>>>     TestResult = header
>>>                  execution
>>>                  status
>>>
>>> but your TestHeader is used only there,
>>
>> Not necessarily. A testing framework that distinguishes test
>> construction from test creation might create the header when the
>> test is constructed. SchemeUnit used to work that way, and RackUnit
>> is able to, although less gracefully than before.
>
> I don't follow this -- what's the difference between "construction"
> and "creation"?

Sorry, I meant "distinguishes test construction from test execution".

>> (See also my final remark, about "test started" notifications.)
>
> Yes, I know that this might imply some division for a sub-struct, I'm
> focusing on just the kind of information that is required.
>
>
>>> so it could be folded in:
>>>
>>>     TestResult = name      (U String #f)
>>>                  suite     (Listof String)
>>>                  info      Dictionary
>>>                  execution
>>>                  status
>>>
>>> TestExecution is also used only once so it can also be folded in --
>>> but since it's just a generic dictionary, it can be dropped.
>>
>> I think it's a bad idea to collapse the two dictionaries, because
>> they represent different information. Especially since the set of
>> keys is open-ended, it is helpful to separate information about the
>> test from information about its execution.
>
> (Same here -- I did the collapse to synthesize what it is that you're
> actually requiring, so I treated all dictionaries as "other stuff",
> which makes them trivially collapsible...)

I don't understand this response.

>>> * What happens when there's no specific expected value to compare?
>>>    For example, run some two pieces of code 10 times each and check
>>>    that the average runtime of the first is below the runtime of
>>>    the second.  This could be phrased in terms of an expected
>>>    value, but in a superficial way, and will prevent useful
>>>    information from being expressed (since the information would
>>>    have to be reduced to two numbers).
>>
>> You can include whatever information you want. That's why it's a
>> dictionary, rather than a fixed set of fields. The real question is
>> how a test result displayer will know how to interpret the fields
>> correctly.  I think a useful default is to show all attributes with
>> keys that are interned symbols or strings. Custom attributes would
>> only work for test result displayers that know about them.
>
> The question is if some attributes are known enough to get a special
> treatment, and then the whole dictionary thing becomes a burden of
> html-like specification rather than an "everything works" advantage.
> What I'd like to see, is something along the lines of:

I think HTTP is a closer analogue than HTML. HTTP has a well-defined 
request line followed by just a bunch of headers (essentially, a 
dictionary mapping strings to strings). The HTTP spec specifies the 
meaning of some headers; other RFCs (cookies, caches/proxies) specify 
the meaning of others; and web browsers and servers are free to use 
others to include information that the other party may or may not find 
interesting.

>    Either
>      String x String dictionary of field-name and field contents
>      or a single string for the result
>
> This avoids such mess as specifying when I use a string for the
> printed form of some value (as you suggested in "Then convert it to a
> string and keep the string") vs when it's a proper value.  It also
> avoids making semi-formal fields that become de facto requirements.
>
>
>>> * This solidifies the list-of-strings as a representation of the
>>>    test hierarchy.  But perhaps there is no way to avoid this -- if
>>>    it's made into a proper hierarchy of objects it will probably
>>>    complicate things in a way that requires the listener to get
>>>    "update" events that tells it how the structure changed.
>>
>> I was actually going to propose something more complicated for the
>> hierarchy, but I figured it was better to leave that for later. I'm
>> certainly open to changing this part.
>
> The dynamic aspect makes it looks fine as is, I think.  It just seems
> redundant to start describing tests accurately to have sections that
> have the same name but are realy separate.
>
>
>>> * I'm not sure about the error result.  It seems to me that this is a
>>>     meta issue that you're dealing with when you develop the test suite,
>>>     and as such it should be something that you'd deal with in the usual
>>>     ways =>   throw an exception.  It's the tools that should be in charge
>>>     of catching such an exception and deal with it -- which means that
>>>     - in my tester's case, it'll defer to racket as usual, meaning that
>>>       you'd just get an error.
>>>     - in rackunit's case you'd probably get some report listing the
>>>       erroneous tests, instead of propagating the error.
>>>     - and in your gui case you'd catch exceptions and show them as error
>>>       results.
>>
>> Are you saying you think a status should only be success or failure?
>> If so, I disagree. I can see roughly how that would work, but I
>> think it's useful to distinguish between failure and error at the
>> reporting level.
>
> It is -- but the question is whether *that* kind of reporting belongs
> in the core specification of these values or not.  Making it be there
> seems wrong to me in the same way that exceptions are never really
> used for anything other than throwing them.  (Except perhaps a few
> weird cases that I'm sure will lead to flames, say add "almost"s or
> whatever.)

I'm not convinced, but I could accept having only two variants, success 
and failure, and considering errors a kind of failure.

>>>> And that's not quite the end of it. The rackunit gui creates an
>>>> entry for a test case as soon as it starts running, so the user
>>>> can see what test case is hanging and interrupt it if they
>>>> choose. That requires additional communication between test
>>>> execution and test display.
>>>
>>> Yes, that would e part of the protocol for the listener -- and it
>>> makes sense to allow tests to invoke it to let it know that a test
>>> has started.
>>
>> Like maybe sending it just the test-header struct? The part that
>> represents the information known about the test before it executes,
>> packaged up as one value?
>>
>> Although, if we're going to standardize this part it would also be
>> nice to have a way of indicating that a suite has started, too.
>
> Yeah -- and that's something that I liked in Noel's list of strings,
> it means that you treat test suites in the same way as tests, which
> IMO means that it will lead to nice uniformities in other places (like
> a gui interface).

I don't think the gui ever displays a test case's name in the same line 
as its enclosing test suite. So no nice uniformities for me.

I'm also concerned about ambiguity. Would '("snark") indicate a test 
named "snark" outside of any test suite or an anonymous test in within a 
test suite named "snark"? We could either disallow anonymous test cases, 
or we could say a test case name is either a string or #f. But now we're 
really abusing cons. And since I want test-headers to accommodate other 
information too, it seems a lot cleaner to me to make it a struct and 
keep name and suite separate.

Ryan


Posted on the dev mailing list.