[racket] SIGSEGV MAPERR si_code 1 fault on addr 0x7...; can't isolate or consistently reproduce in source code; stack trace points to scheme_uncopy_stack

From: John Gateley (racket at jfoo.org)
Date: Thu May 2 19:27:58 EDT 2013

Hi, I'm not a Racket expert, but I have a little experience with
core dumps.

If you compiled the interpreter yourself, no one else will be able to
do much with the core. But that's not the end of the world.

Do you have the source/debug info option turned on when compiling?
If not, recompile with that option turned on.

Next, do a "bt" command (backtrace) and send the result here.

If the stack is huge, as you suggest below, it could be a standard
stack overflow. A core won't be much help there, except to hint
at what arguments to the infinitely recurring function might be
producing it.

If this is random, also as you suggest, this is ugly. It is usually
reading unitialized memory somehow, possibly via a double free.
The best approach is using a tool like bounds checker or purify
or similar - they'll identify issues like this. I don't know if
Racket is set up to use any of these, but it would be a worthwhile
investment.

Finally, I would suggest resending to dev - they'll have more
familiarity with the underlying code.

John


On 5/2/2013 12:42 PM, Matthew Eric Bassett wrote:
> Hi all,
>
> It might be better to send this to dev at racket-lang.  Then again, it
> might be completely useless to them.
>
> So we have a job scheduler program written in racket that handles
> various places and tcp clients.  This program sporadically and
> inconsistently terminates with the following error message:
>
> SIGSEGV MAPERR si_code 1 fault on addr 0x7fffb044ef48
>
> We've caught the error at various different point of execution, but
> can't consistently reproduce it (yet).  We do have a core dump of the
> program running and terminating from the racket repl (loaded with
> "enter!") v 5.3.3.  I've made it available via dropbox at
> https://www.dropbox.com/s/rkd6pl511acll2r/core.12346.gz.
>
> I don't have much experience reading core dumps, but it looks to me like
> racket is hitting a stackoverflow in scheme_uncopy_stack in setjmpup.c.
>
> In particular, at the first time scheme_uncopy_stack appears in the
> stack with args scheme_uncopy_stack (ok=0, b=0x7f857b2b3510,
> prev=0x7fffb044f5e0) we have:
>
>>> (gdb) p prev
> $1 = (intptr_t *) 0x7fffb044f5e0
>>> (gdb) p *prev
> $2 = 0
>>> (gdb) p b
> $3 = (Scheme_Jumpup_Buf *) 0x7f857b2b3510
>>> (gdb) p *b
> $4 = {stack_from = 0x0, stack_copy = 0x0, stack_size = 0, stack_max_size
> = 0, cont = 0x0, buf = {jb = {{jb = {{
>              __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0,
> __saved_mask = {__val = {
>                  0 <repeats 16 times>}}}}, stack_frame = 0}}, gcvs = 0,
> gcvs_cnt = 0}, gc_var_stack = 0x0,
>    external_stack = 0x0}
>
> scheme_uncopy_stack remains in the stack for several thousand frames.
>
> The racket interpreter was compiled from source (so I don't know if
> others can even read that coredump!) on a linux kernel
> 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64
> x86_64 x86_64 GNU/Linux with glibc/-devel glibc-2.12-1.107.43.amzn1.x86_64.
>
> I was able to capture the same error running from the MacOSX binaries
> 5.3.3 from racket-lang.org.  That core dump is available at
> https://www.dropbox.com/s/jfneqr4zlkmkjhh/core.41166.gz.
>
>
> Is this an error in racket?  IF not, do you have any suggestions on how
> I can proceed in debugging this (I'm at a loss?) or even to figure out
> which bits of my racket code to look at?  (I've tried doing "info
> locals" from gdb at various points in the stack, but I've not reached
> enlightenment.  Again, I have little experience with reading core dumps).
>
> I've not included any of our racket code, as we don't know which part is
> causing the problem.
>
> Thanks for reading,
>
> --
> Matthew Eric Bassett | http://mebassett.info
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users

Posted on the users mailing list.