[racket] debugging core dump - comments appreciated

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Thu May 26 18:46:19 EDT 2011

Just to update the email list for posterity on the rare process crash 
that I saw in production of a server app that launched many thousands of 
short-lived processes...

Matthew Flatt kindly did some debugging, and, if I understand correctly, 
the cause he found was a combination of the app's Linux servers using 
address space randomization, and the app's code having thunder thighs.  
On very rare occasions, the planets would align just wrong, and 
randomization would mean that the address range of the app's stack would 
be pushed outside the range that the GC expected.  Or something like that.

The app developers are currently stress-testing the code with address 
space randomization disabled on a test server.  So far they haven't been 
able to elicit another crash.

We also will be making this app more svelte on the stack, now that the 
Dr. has pointed out the weight problem.

I wanted to mention that no fault of Racket is implicated here, and that 
Racket has been nicely reliable for this app...

I believe that any huge stack frames of this app are due to historical 
peculiarities of the apps's code, not a fault of Racket.  The code has 
been in production for years, and (p)reinvented a few wheels that it 
would not need to with contemporary Racket.

Some Googling suggests that people have encountered a similar problem 
with address space randomization messing things up for the official Java 
JVM.  You can also find occasional mentions of this if you Google for it 
with the names of some other language implementations.  I think it's not 
a well-known problem, and an app developer needs to have enough volume 
to encounter the problem, followed by the will to investigate a crash 
rather than consider the occasional freak crash to be acceptable.


Posted on the users mailing list.