[racket] SIGSEGV MAPERR si_code 1 fault on addr 0x7...; can't isolate or consistently reproduce in source code; stack trace points to scheme_uncopy_stack

From: Matthew Eric Bassett (mebassett at gegn.net)
Date: Thu May 9 06:12:48 EDT 2013

After some hard, clever work by my colleague, we've managed to narrow 
this one down a bit further.

First, we have compiled racket with optimization disabled, and we do 
have an all-zeroed Scheme_Jmpup_Buf.  Please see our gdb session at 
http://pastebin.com/aBx2FTcK

Second, we've managed to consistently reproduce the segfault in a 
single line of code (a core dump of the racket session looks a lot like 
the one running with our code).  The offending line is

>>(let loop () (thread (const '())) (loop))

Obviously, we don't have that exact line in our production code :)  but 
it produces the same error more quickly and more consistently.  
Interestingly,

>>(let loop () (thread (thunk '())) (loop))

Does not produce a segfault.

We've caught this segv on racket compiled on an AWS machine and on the 
Mac OSX binaries distributed by you guys.




On 2013-05-03 02:04, Matthew Flatt wrote:
> A stack overflow in scheme_uncopy_stack() sounds like a thread that 
> is
> trying to jump to a continuation whose representation is corrupted. 
> (An
> all-zeroed Scheme_Jmpup_Buf could have that effect, but I don't
> particularly trust gdb to tell us the actual content, unless you
> disabled optimization when compiling `racket'.)
>
> Assuming that the latest in the Racket git repo doesn't work any 
> better
> for you --- and I don't expect that it does in this case --- if you 
> can
> send me something to run that provokes the crash, I can investigate
> more.
>
> At Thu, 02 May 2013 18:42:56 +0100, Matthew Eric Bassett wrote:
>> Hi all,
>>
>> It might be better to send this to dev at racket-lang.  Then again, it
>> might be completely useless to them.
>>
>> So we have a job scheduler program written in racket that handles
>> various places and tcp clients.  This program sporadically and
>> inconsistently terminates with the following error message:
>>
>> SIGSEGV MAPERR si_code 1 fault on addr 0x7fffb044ef48
>>
>> We've caught the error at various different point of execution, but
>> can't consistently reproduce it (yet).  We do have a core dump of 
>> the
>> program running and terminating from the racket repl (loaded with
>> "enter!") v 5.3.3.  I've made it available via dropbox at
>> https://www.dropbox.com/s/rkd6pl511acll2r/core.12346.gz.
>>
>> I don't have much experience reading core dumps, but it looks to me
>> like racket is hitting a stackoverflow in scheme_uncopy_stack in
>> setjmpup.c.
>>
>> In particular, at the first time scheme_uncopy_stack appears in the
>> stack with args scheme_uncopy_stack (ok=0, b=0x7f857b2b3510,
>> prev=0x7fffb044f5e0) we have:
>>
>> >>(gdb) p prev
>> $1 = (intptr_t *) 0x7fffb044f5e0
>> >>(gdb) p *prev
>> $2 = 0
>> >>(gdb) p b
>> $3 = (Scheme_Jumpup_Buf *) 0x7f857b2b3510
>> >>(gdb) p *b
>> $4 = {stack_from = 0x0, stack_copy = 0x0, stack_size = 0,
>> stack_max_size = 0, cont = 0x0, buf = {jb = {{jb = {{
>>              __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 
>> 0,
>> __saved_mask = {__val = {
>>                  0 <repeats 16 times>}}}}, stack_frame = 0}}, gcvs = 
>> 0,
>> gcvs_cnt = 0}, gc_var_stack = 0x0,
>>    external_stack = 0x0}
>>
>> scheme_uncopy_stack remains in the stack for several thousand 
>> frames.
>>
>> The racket interpreter was compiled from source (so I don't know if
>> others can even read that coredump!) on a linux kernel
>> 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64
>> x86_64 x86_64 GNU/Linux with glibc/-devel
>> glibc-2.12-1.107.43.amzn1.x86_64.
>>
>> I was able to capture the same error running from the MacOSX 
>> binaries
>> 5.3.3 from racket-lang.org.  That core dump is available at
>> https://www.dropbox.com/s/jfneqr4zlkmkjhh/core.41166.gz.
>>
>>
>> Is this an error in racket?  IF not, do you have any suggestions on 
>> how
>> I can proceed in debugging this (I'm at a loss?) or even to figure 
>> out
>> which bits of my racket code to look at?  (I've tried doing "info
>> locals" from gdb at various points in the stack, but I've not 
>> reached
>> enlightenment.  Again, I have little experience with reading core
>> dumps).
>>
>> I've not included any of our racket code, as we don't know which 
>> part
>> is causing the problem.
>>
>> Thanks for reading,
>>
>> --
>> Matthew Eric Bassett | http://mebassett.info
>> ____________________
>>   Racket Users list:
>>   http://lists.racket-lang.org/users
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users

-- 
--
Matthew Eric Bassett | http://mebassett.info

Posted on the users mailing list.