[racket] SIGSEGV MAPERR si_code 1 fault on addr 0x7...; can't isolate or consistently reproduce in source code; stack trace points to scheme_uncopy_stack

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu May 2 21:04:12 EDT 2013

A stack overflow in scheme_uncopy_stack() sounds like a thread that is
trying to jump to a continuation whose representation is corrupted. (An
all-zeroed Scheme_Jmpup_Buf could have that effect, but I don't
particularly trust gdb to tell us the actual content, unless you
disabled optimization when compiling `racket'.)

Assuming that the latest in the Racket git repo doesn't work any better
for you --- and I don't expect that it does in this case --- if you can
send me something to run that provokes the crash, I can investigate
more.

At Thu, 02 May 2013 18:42:56 +0100, Matthew Eric Bassett wrote:
> Hi all,
> 
> It might be better to send this to dev at racket-lang.  Then again, it 
> might be completely useless to them.
> 
> So we have a job scheduler program written in racket that handles 
> various places and tcp clients.  This program sporadically and 
> inconsistently terminates with the following error message:
> 
> SIGSEGV MAPERR si_code 1 fault on addr 0x7fffb044ef48
> 
> We've caught the error at various different point of execution, but 
> can't consistently reproduce it (yet).  We do have a core dump of the 
> program running and terminating from the racket repl (loaded with 
> "enter!") v 5.3.3.  I've made it available via dropbox at 
> https://www.dropbox.com/s/rkd6pl511acll2r/core.12346.gz.
> 
> I don't have much experience reading core dumps, but it looks to me 
> like racket is hitting a stackoverflow in scheme_uncopy_stack in 
> setjmpup.c.
> 
> In particular, at the first time scheme_uncopy_stack appears in the 
> stack with args scheme_uncopy_stack (ok=0, b=0x7f857b2b3510, 
> prev=0x7fffb044f5e0) we have:
> 
> >>(gdb) p prev
> $1 = (intptr_t *) 0x7fffb044f5e0
> >>(gdb) p *prev
> $2 = 0
> >>(gdb) p b
> $3 = (Scheme_Jumpup_Buf *) 0x7f857b2b3510
> >>(gdb) p *b
> $4 = {stack_from = 0x0, stack_copy = 0x0, stack_size = 0, 
> stack_max_size = 0, cont = 0x0, buf = {jb = {{jb = {{
>              __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, 
> __saved_mask = {__val = {
>                  0 <repeats 16 times>}}}}, stack_frame = 0}}, gcvs = 0, 
> gcvs_cnt = 0}, gc_var_stack = 0x0,
>    external_stack = 0x0}
> 
> scheme_uncopy_stack remains in the stack for several thousand frames.
> 
> The racket interpreter was compiled from source (so I don't know if 
> others can even read that coredump!) on a linux kernel 
> 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 
> x86_64 x86_64 GNU/Linux with glibc/-devel 
> glibc-2.12-1.107.43.amzn1.x86_64.
> 
> I was able to capture the same error running from the MacOSX binaries 
> 5.3.3 from racket-lang.org.  That core dump is available at 
> https://www.dropbox.com/s/jfneqr4zlkmkjhh/core.41166.gz.
> 
> 
> Is this an error in racket?  IF not, do you have any suggestions on how 
> I can proceed in debugging this (I'm at a loss?) or even to figure out 
> which bits of my racket code to look at?  (I've tried doing "info 
> locals" from gdb at various points in the stack, but I've not reached 
> enlightenment.  Again, I have little experience with reading core 
> dumps).
> 
> I've not included any of our racket code, as we don't know which part 
> is causing the problem.
> 
> Thanks for reading,
> 
> --
> Matthew Eric Bassett | http://mebassett.info
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users

Posted on the users mailing list.