From: Matthew Flatt (mflatt at cs.utah.edu) Date: Thu May 2 21:04:12 EDT 2013 |
|
A stack overflow in scheme_uncopy_stack() sounds like a thread that is trying to jump to a continuation whose representation is corrupted. (An all-zeroed Scheme_Jmpup_Buf could have that effect, but I don't particularly trust gdb to tell us the actual content, unless you disabled optimization when compiling `racket'.) Assuming that the latest in the Racket git repo doesn't work any better for you --- and I don't expect that it does in this case --- if you can send me something to run that provokes the crash, I can investigate more. At Thu, 02 May 2013 18:42:56 +0100, Matthew Eric Bassett wrote: > Hi all, > > It might be better to send this to dev at racket-lang. Then again, it > might be completely useless to them. > > So we have a job scheduler program written in racket that handles > various places and tcp clients. This program sporadically and > inconsistently terminates with the following error message: > > SIGSEGV MAPERR si_code 1 fault on addr 0x7fffb044ef48 > > We've caught the error at various different point of execution, but > can't consistently reproduce it (yet). We do have a core dump of the > program running and terminating from the racket repl (loaded with > "enter!") v 5.3.3. I've made it available via dropbox at > https://www.dropbox.com/s/rkd6pl511acll2r/core.12346.gz. > > I don't have much experience reading core dumps, but it looks to me > like racket is hitting a stackoverflow in scheme_uncopy_stack in > setjmpup.c. > > In particular, at the first time scheme_uncopy_stack appears in the > stack with args scheme_uncopy_stack (ok=0, b=0x7f857b2b3510, > prev=0x7fffb044f5e0) we have: > > >>(gdb) p prev > $1 = (intptr_t *) 0x7fffb044f5e0 > >>(gdb) p *prev > $2 = 0 > >>(gdb) p b > $3 = (Scheme_Jumpup_Buf *) 0x7f857b2b3510 > >>(gdb) p *b > $4 = {stack_from = 0x0, stack_copy = 0x0, stack_size = 0, > stack_max_size = 0, cont = 0x0, buf = {jb = {{jb = {{ > __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, > __saved_mask = {__val = { > 0 <repeats 16 times>}}}}, stack_frame = 0}}, gcvs = 0, > gcvs_cnt = 0}, gc_var_stack = 0x0, > external_stack = 0x0} > > scheme_uncopy_stack remains in the stack for several thousand frames. > > The racket interpreter was compiled from source (so I don't know if > others can even read that coredump!) on a linux kernel > 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 > x86_64 x86_64 GNU/Linux with glibc/-devel > glibc-2.12-1.107.43.amzn1.x86_64. > > I was able to capture the same error running from the MacOSX binaries > 5.3.3 from racket-lang.org. That core dump is available at > https://www.dropbox.com/s/jfneqr4zlkmkjhh/core.41166.gz. > > > Is this an error in racket? IF not, do you have any suggestions on how > I can proceed in debugging this (I'm at a loss?) or even to figure out > which bits of my racket code to look at? (I've tried doing "info > locals" from gdb at various points in the stack, but I've not reached > enlightenment. Again, I have little experience with reading core > dumps). > > I've not included any of our racket code, as we don't know which part > is causing the problem. > > Thanks for reading, > > -- > Matthew Eric Bassett | http://mebassett.info > ____________________ > Racket Users list: > http://lists.racket-lang.org/users