Possibly unrelated seg faults (was: Re: [plt-scheme] 370/1 3m segfaults under linux)

From: John Clements (clements at brinckerhoff.org)
Date: Tue Oct 9 23:18:08 EDT 2007

On Oct 9, 2007, at 5:02 PM, Matthew Flatt wrote:

> At Wed, 10 Oct 2007 00:43:13 +0100, John Kozak wrote:
>> Running built-from-source versions of 370 and 371 under i386 and
>> X86_64 strains of recentish debian linux, I'm getting segfaults:  
>> after
>> doing an amount of I/O (reading 1000 binary files of about 2MB each),
>> I get a segfault, I think in GC.  I can't confirm this, because  
>> when I
>> try to run mzscheme under gdb, it segfaults in startup!  Any  
>> thoughts?
>
> The seg fault you see in gdb is the write barrier. Use
>
>  handle SIGSEGV nostop noprint
>
> and continue. When you get to the real problem (i.e., a seg fault that
> isn't a write barrier), the SIGSEGV signal handler will call abort().
>
> Any information you can extract by running in gdb will be much
> appreciated.
>
> I've fixed at least one GC bug since v371, but it's related to using
> structs that can be applied as procedures. So, you might try the  
> latest
> from SVN (or using a source archive from the nightly-build page), if
> you haven't already. My guess is that you're running into something
> new.

My students are seeing frequent seg faults, frequently on startup.   
This is svn-oct-5, built from source on fedora 8 (not 64-bit).  One  
of my students managed to provide a core file; unfortunately, I was  
unable to glean anything useful by coupling it to gdb.  Here's what I  
got:

clements at victoria:~ $ gdb ~/102-plt/svn-plt/plt/bin/mred /tmp/core.23818
GNU gdb Red Hat Linux (6.6-15.fc7rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for  
details.
This GDB was configured as "i386-redhat-linux-gnu"...
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `/home/clements/102-plt/svn-plt/plt/bin/mred - 
N /home/clements/drscheme -ZmvqL-'.
Program terminated with signal 6, Aborted.
#0  0x0012d402 in _start () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0012d402 in _start () from /lib/ld-linux.so.2
Cannot access memory at address 0xbf823b70
(gdb)
clements at victoria:~/102-plt/svn-plt/plt/bin $ uname -a
Linux victoria.csc.calpoly.edu 2.6.22.4-65.fc7 #1 SMP Tue Aug 21  
22:36:56 EDT 2007 i686 i686 i386 GNU/Linux



If I'm reading this message correctly, it means that mred called abort 
().  However, my gdb-fu has clearly decayed, because I'm still  
puzzled as to why I can't get a stack trace (note the failed attempt  
to "bt").

Is there any way to extract useful information from this 50 Meg core  
file?


John

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2223 bytes
Desc: not available
URL: <http://lists.racket-lang.org/users/archive/attachments/20071009/8d1e9bfe/attachment.p7s>

Posted on the users mailing list.