[racket] debugging core dump - comments appreciated
Just to update the email list for posterity on the rare process crash
that I saw in production of a server app that launched many thousands of
short-lived processes...
Matthew Flatt kindly did some debugging, and, if I understand correctly,
the cause he found was a combination of the app's Linux servers using
address space randomization, and the app's code having thunder thighs.
On very rare occasions, the planets would align just wrong, and
randomization would mean that the address range of the app's stack would
be pushed outside the range that the GC expected. Or something like that.
The app developers are currently stress-testing the code with address
space randomization disabled on a test server. So far they haven't been
able to elicit another crash.
We also will be making this app more svelte on the stack, now that the
Dr. has pointed out the weight problem.
I wanted to mention that no fault of Racket is implicated here, and that
Racket has been nicely reliable for this app...
I believe that any huge stack frames of this app are due to historical
peculiarities of the apps's code, not a fault of Racket. The code has
been in production for years, and (p)reinvented a few wheels that it
would not need to with contemporary Racket.
Some Googling suggests that people have encountered a similar problem
with address space randomization messing things up for the official Java
JVM. You can also find occasional mentions of this if you Google for it
with the names of some other language implementations. I think it's not
a well-known problem, and an app developer needs to have enough volume
to encounter the problem, followed by the will to investigate a crash
rather than consider the occasional freak crash to be acceptable.
--
http://www.neilvandyke.org/