[racket-dev] [DrDr] R25887 (timeout 1) (unclean 1) (stderr 1) (changes 64)

From: Sam Tobin-Hochstadt (samth at ccs.neu.edu)
Date: Wed Dec 12 13:51:14 EST 2012

On Wed, Dec 12, 2012 at 1:44 PM, Jay McCarthy <jay at cs.byu.edu> wrote:
> On Tue, Dec 11, 2012 at 1:01 PM, Sam Tobin-Hochstadt <samth at ccs.neu.edu> wrote:
>> The Typed Racket optimizer tests continue to fail on an intermittent
>> basis in DrDr, as shown below.  I'd really like to fix this,
>> especially since we're doing very well for zero failures on DrDr, but
>> I don't know what's going wrong.
>>
>> The error is:
>>
>> force: promise's thread terminated without result or exception
>>   promise: #<promise:!running!...et/optimizer/run.rkt:47:28>
>>   context...:
>>    /opt/plt/builds/<current-rev>/trunk/collects/racket/promise.rkt:98:2
>>    /opt/plt/builds/<current-rev>/trunk/collects/tests/typed-racket/optimizer/run.rkt:50:3:
>> for-loop
>>    /opt/plt/builds/<current-rev>/trunk/collects/tests/typed-racket/optimizer/run.rkt:44:0:
>> mk-suite
>>
>> which I think indicates that a thread is being killed somewhere, but I
>> don't know why that would be happening, and I haven't seen this
>> happening on other machines.
>>
>> Are there any techniques recommending for finding what's killing this
>> thread?  Does DrDr do anything special that would affect this?  The
>> relevant code is here:
>> https://github.com/plt/racket/blob/master/collects/tests/typed-racket/optimizer/run.rkt#L44-53
>
> There are two cases were DrDr kills stuff. Case 1 is when there's a
> timeout, but you aren't close to that. Case 2 is when DrDr itself
> crashes and needs to get a clean slate of system resources. I think
> this is unlikely to be the problem, because the DrDr instance that
> crashed would be the one listening to the test's output not the one
> that did the killing. But I can look into it.

Given that this happens quite regularly (maybe 1 in 5 runs) I'd be
surprised if DrDr itself crashed that often.  Is that likely?

Sam

Posted on the dev mailing list.