[plt-dev] lessons learned

From: Jacob Matthews (jacobm at cs.uchicago.edu)
Date: Fri May 22 13:39:10 EDT 2009

I don't mean to argue with you about this, but I don't really find
these points convincing. After all, we manage to make it work even
though I'm pretty sure we've got way more code than PLT Scheme. (To
quote Kevin Bourrillion, Google's Java library curator: "If code were
ice cream, we would have ... a whole lot of ice cream.")

I think what you're really saying is that it's impractical to test all
of PLT Scheme with one central testing service that tries to do the
whole thing at once. I can definitely believe that. I think the key
might be in making the process more modular --- a continuous build
with "all of PLT Scheme" as its target is orders of magnitude too big
to be very useful IMO. One way to chop things up would be to have
separate builds for different components: a mzscheme build, a mred
build, a drscheme build (or maybe further break down DrS into smaller
pieces and potentially several other components for different
collects. Each sub-build runs the test suites appropriate to it, and
just measures code coverage in the files it cares about. (In practice,
even though a test suite written to cover, say, a mred component might
incidentally cover some mzlib stuff, that coverage isn't very
high-quality and probably shouldn't count as mzlib tests anyway.)
Sub-builds can run independently on different machines, set up by
individual groups. When I was working on planet, for instance, I
could've set up my own sub-project just for it, and had the system run
on my local machine. (I do effectively the same thing on my current
project.)

One way to think about it is: suppose you're monitoring the the output
of tests, and you get a message saying some tests have failed. Do you
care? If you don't, you need to think about making better targets, and
only monitoring the targets for which you can unhesitatingly say yes.
This will incidentally make it a lot easier for smaller projects to
get up and running.

Coverage is actually a really important metric for test suites;
without it, you get the warm fuzzies of seeing that all tests passed,
but you don't get any sense whatsoever about how much assurance you
can derive from them. It is worth investing effort into measuring it.

-jacob

(Part of my current job is to advocate good testing practices and help
other teams set up good testing infrastructure for their projects, so
if I come off like an evangelist, that's why. :) )



On Fri, May 22, 2009 at 10:08 AM, Eli Barzilay <eli at barzilay.org> wrote:
> On May 22, Jacob Matthews wrote:
>> On Fri, May 22, 2009 at 10:00 AM, Eli Barzilay <eli at barzilay.org> wrote:
>> > On May 22, John Clements wrote:
>> >>
>> >> Well, if you're volunteering, what I'd really like is a way to do
>> >> coverage testing across multiple files; the current green/red
>> >> mechanism doesn't scale well.
>> >
>> > In any case, measuring coverage for the tests is not practical ATM.
>>
>> Out of curiosity: Why not?
>
> (a) errortrace is adding a good runtime factor -- and the tests take a
>    considerable time (compiling in errortrace mode can work too, but
>    even that is horribly expensive)
>
> (b) There's a *lot* of code, so keeping track of all expressions will
>    be a problem
>
> (c) code is executed one test suite at a time, so it will require
>    running it, collecting the results, etc, then combining them all.
>
> --
>          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
>                  http://www.barzilay.org/                 Maze is Life!
>
>


Posted on the dev mailing list.