[racket] SIGSEGV MAPERR si_code 1 fault on addr 0x7...; can't isolate or consistently reproduce in source code; stack trace points to scheme_uncopy_stack
After some hard, clever work by my colleague, we've managed to narrow
this one down a bit further.
First, we have compiled racket with optimization disabled, and we do
have an all-zeroed Scheme_Jmpup_Buf. Please see our gdb session at
http://pastebin.com/aBx2FTcK
Second, we've managed to consistently reproduce the segfault in a
single line of code (a core dump of the racket session looks a lot like
the one running with our code). The offending line is
>>(let loop () (thread (const '())) (loop))
Obviously, we don't have that exact line in our production code :) but
it produces the same error more quickly and more consistently.
Interestingly,
>>(let loop () (thread (thunk '())) (loop))
Does not produce a segfault.
We've caught this segv on racket compiled on an AWS machine and on the
Mac OSX binaries distributed by you guys.
On 2013-05-03 02:04, Matthew Flatt wrote:
> A stack overflow in scheme_uncopy_stack() sounds like a thread that
> is
> trying to jump to a continuation whose representation is corrupted.
> (An
> all-zeroed Scheme_Jmpup_Buf could have that effect, but I don't
> particularly trust gdb to tell us the actual content, unless you
> disabled optimization when compiling `racket'.)
>
> Assuming that the latest in the Racket git repo doesn't work any
> better
> for you --- and I don't expect that it does in this case --- if you
> can
> send me something to run that provokes the crash, I can investigate
> more.
>
> At Thu, 02 May 2013 18:42:56 +0100, Matthew Eric Bassett wrote:
>> Hi all,
>>
>> It might be better to send this to dev at racket-lang. Then again, it
>> might be completely useless to them.
>>
>> So we have a job scheduler program written in racket that handles
>> various places and tcp clients. This program sporadically and
>> inconsistently terminates with the following error message:
>>
>> SIGSEGV MAPERR si_code 1 fault on addr 0x7fffb044ef48
>>
>> We've caught the error at various different point of execution, but
>> can't consistently reproduce it (yet). We do have a core dump of
>> the
>> program running and terminating from the racket repl (loaded with
>> "enter!") v 5.3.3. I've made it available via dropbox at
>> https://www.dropbox.com/s/rkd6pl511acll2r/core.12346.gz.
>>
>> I don't have much experience reading core dumps, but it looks to me
>> like racket is hitting a stackoverflow in scheme_uncopy_stack in
>> setjmpup.c.
>>
>> In particular, at the first time scheme_uncopy_stack appears in the
>> stack with args scheme_uncopy_stack (ok=0, b=0x7f857b2b3510,
>> prev=0x7fffb044f5e0) we have:
>>
>> >>(gdb) p prev
>> $1 = (intptr_t *) 0x7fffb044f5e0
>> >>(gdb) p *prev
>> $2 = 0
>> >>(gdb) p b
>> $3 = (Scheme_Jumpup_Buf *) 0x7f857b2b3510
>> >>(gdb) p *b
>> $4 = {stack_from = 0x0, stack_copy = 0x0, stack_size = 0,
>> stack_max_size = 0, cont = 0x0, buf = {jb = {{jb = {{
>> __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved =
>> 0,
>> __saved_mask = {__val = {
>> 0 <repeats 16 times>}}}}, stack_frame = 0}}, gcvs =
>> 0,
>> gcvs_cnt = 0}, gc_var_stack = 0x0,
>> external_stack = 0x0}
>>
>> scheme_uncopy_stack remains in the stack for several thousand
>> frames.
>>
>> The racket interpreter was compiled from source (so I don't know if
>> others can even read that coredump!) on a linux kernel
>> 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64
>> x86_64 x86_64 GNU/Linux with glibc/-devel
>> glibc-2.12-1.107.43.amzn1.x86_64.
>>
>> I was able to capture the same error running from the MacOSX
>> binaries
>> 5.3.3 from racket-lang.org. That core dump is available at
>> https://www.dropbox.com/s/jfneqr4zlkmkjhh/core.41166.gz.
>>
>>
>> Is this an error in racket? IF not, do you have any suggestions on
>> how
>> I can proceed in debugging this (I'm at a loss?) or even to figure
>> out
>> which bits of my racket code to look at? (I've tried doing "info
>> locals" from gdb at various points in the stack, but I've not
>> reached
>> enlightenment. Again, I have little experience with reading core
>> dumps).
>>
>> I've not included any of our racket code, as we don't know which
>> part
>> is causing the problem.
>>
>> Thanks for reading,
>>
>> --
>> Matthew Eric Bassett | http://mebassett.info
>> ____________________
>> Racket Users list:
>> http://lists.racket-lang.org/users
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
--
--
Matthew Eric Bassett | http://mebassett.info