[racket-dev] [racket] tests/eli-tester feedback (Was: Racket unit testing)
On 02/18/2011 02:12 PM, Eli Barzilay wrote:
> 25 minutes ago, Ryan Culpepper wrote:
>> On 02/18/2011 07:30 AM, Eli Barzilay wrote:
>>> 50 minutes ago, Ryan Culpepper wrote:
>>>> On 02/15/2011 07:28 AM, Eli Barzilay wrote:
>>>>> And finaly, there's the litmus test for existing code:
>>>>>
>>>>> * Ryan: is something like this enough to implement the GUI layer?
>>>>
>>>> Not well, I think. The Test-Result type in Noel's racktest code is
>>>> too simple and inflexible. It represents the minimal essence of
>>>> testing, but it would be awkward to extend to richer testing
>>>> sytems. Here's my counter-proposal for representing the results of
>>>> tests:
>>>> [...]
>>>
>>> I can't make sense of it, besides a vague "waaaay to heavy" feeling
>>> for something that should be core-ishly minimalistic.
>>
>> Simplicity is no good if it gets in the way of representing information
>> that needs to be represented.
>
> [But the flip token is that complexity is no good if you end up with
> something that doesn't fit any system, where each one is filling in
> fields that it doesn't "want to".]
The representation I outlined is based on the needs of the rackunit gui,
where test execution and display are currently tightly coupled. By that
I mean that plain rackunit's notion of test results is insufficient for
the gui; I had to create my own. I also generalized the idea of test
headers based on a long-standing feature request (the ability to
designate tests as expected to fail).
>>> In an attempt to follow it, I did this:
>>>
>>> TestResult = header
>>> execution
>>> status
>>>
>>> but your TestHeader is used only there,
>>
>> Not necessarily. A testing framework that distinguishes test
>> construction from test creation might create the header when the
>> test is constructed. SchemeUnit used to work that way, and RackUnit
>> is able to, although less gracefully than before.
>
> I don't follow this -- what's the difference between "construction"
> and "creation"?
Sorry, I meant "distinguishes test construction from test execution".
>> (See also my final remark, about "test started" notifications.)
>
> Yes, I know that this might imply some division for a sub-struct, I'm
> focusing on just the kind of information that is required.
>
>
>>> so it could be folded in:
>>>
>>> TestResult = name (U String #f)
>>> suite (Listof String)
>>> info Dictionary
>>> execution
>>> status
>>>
>>> TestExecution is also used only once so it can also be folded in --
>>> but since it's just a generic dictionary, it can be dropped.
>>
>> I think it's a bad idea to collapse the two dictionaries, because
>> they represent different information. Especially since the set of
>> keys is open-ended, it is helpful to separate information about the
>> test from information about its execution.
>
> (Same here -- I did the collapse to synthesize what it is that you're
> actually requiring, so I treated all dictionaries as "other stuff",
> which makes them trivially collapsible...)
I don't understand this response.
>>> * What happens when there's no specific expected value to compare?
>>> For example, run some two pieces of code 10 times each and check
>>> that the average runtime of the first is below the runtime of
>>> the second. This could be phrased in terms of an expected
>>> value, but in a superficial way, and will prevent useful
>>> information from being expressed (since the information would
>>> have to be reduced to two numbers).
>>
>> You can include whatever information you want. That's why it's a
>> dictionary, rather than a fixed set of fields. The real question is
>> how a test result displayer will know how to interpret the fields
>> correctly. I think a useful default is to show all attributes with
>> keys that are interned symbols or strings. Custom attributes would
>> only work for test result displayers that know about them.
>
> The question is if some attributes are known enough to get a special
> treatment, and then the whole dictionary thing becomes a burden of
> html-like specification rather than an "everything works" advantage.
> What I'd like to see, is something along the lines of:
I think HTTP is a closer analogue than HTML. HTTP has a well-defined
request line followed by just a bunch of headers (essentially, a
dictionary mapping strings to strings). The HTTP spec specifies the
meaning of some headers; other RFCs (cookies, caches/proxies) specify
the meaning of others; and web browsers and servers are free to use
others to include information that the other party may or may not find
interesting.
> Either
> String x String dictionary of field-name and field contents
> or a single string for the result
>
> This avoids such mess as specifying when I use a string for the
> printed form of some value (as you suggested in "Then convert it to a
> string and keep the string") vs when it's a proper value. It also
> avoids making semi-formal fields that become de facto requirements.
>
>
>>> * This solidifies the list-of-strings as a representation of the
>>> test hierarchy. But perhaps there is no way to avoid this -- if
>>> it's made into a proper hierarchy of objects it will probably
>>> complicate things in a way that requires the listener to get
>>> "update" events that tells it how the structure changed.
>>
>> I was actually going to propose something more complicated for the
>> hierarchy, but I figured it was better to leave that for later. I'm
>> certainly open to changing this part.
>
> The dynamic aspect makes it looks fine as is, I think. It just seems
> redundant to start describing tests accurately to have sections that
> have the same name but are realy separate.
>
>
>>> * I'm not sure about the error result. It seems to me that this is a
>>> meta issue that you're dealing with when you develop the test suite,
>>> and as such it should be something that you'd deal with in the usual
>>> ways => throw an exception. It's the tools that should be in charge
>>> of catching such an exception and deal with it -- which means that
>>> - in my tester's case, it'll defer to racket as usual, meaning that
>>> you'd just get an error.
>>> - in rackunit's case you'd probably get some report listing the
>>> erroneous tests, instead of propagating the error.
>>> - and in your gui case you'd catch exceptions and show them as error
>>> results.
>>
>> Are you saying you think a status should only be success or failure?
>> If so, I disagree. I can see roughly how that would work, but I
>> think it's useful to distinguish between failure and error at the
>> reporting level.
>
> It is -- but the question is whether *that* kind of reporting belongs
> in the core specification of these values or not. Making it be there
> seems wrong to me in the same way that exceptions are never really
> used for anything other than throwing them. (Except perhaps a few
> weird cases that I'm sure will lead to flames, say add "almost"s or
> whatever.)
I'm not convinced, but I could accept having only two variants, success
and failure, and considering errors a kind of failure.
>>>> And that's not quite the end of it. The rackunit gui creates an
>>>> entry for a test case as soon as it starts running, so the user
>>>> can see what test case is hanging and interrupt it if they
>>>> choose. That requires additional communication between test
>>>> execution and test display.
>>>
>>> Yes, that would e part of the protocol for the listener -- and it
>>> makes sense to allow tests to invoke it to let it know that a test
>>> has started.
>>
>> Like maybe sending it just the test-header struct? The part that
>> represents the information known about the test before it executes,
>> packaged up as one value?
>>
>> Although, if we're going to standardize this part it would also be
>> nice to have a way of indicating that a suite has started, too.
>
> Yeah -- and that's something that I liked in Noel's list of strings,
> it means that you treat test suites in the same way as tests, which
> IMO means that it will lead to nice uniformities in other places (like
> a gui interface).
I don't think the gui ever displays a test case's name in the same line
as its enclosing test suite. So no nice uniformities for me.
I'm also concerned about ambiguity. Would '("snark") indicate a test
named "snark" outside of any test suite or an anonymous test in within a
test suite named "snark"? We could either disallow anonymous test cases,
or we could say a test case name is either a string or #f. But now we're
really abusing cons. And since I want test-headers to accommodate other
information too, it seems a lot cleaner to me to make it a struct and
keep name and suite separate.
Ryan