[racket-dev] posting to semaphore from C causes seg fault

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sat Sep 17 10:32:54 EDT 2011

It looks like the call in C might have been in a thread other than the
thread where Racket was started. In that case, when scheme_post_sema()
tries to cooperate with the GC, then it would end up with a NULL
pointer for the Racket GC information of the current thread.

In particular, since you're asking about semaphores, I wonder whether
you were trying to use Racket semaphores to synchronize OS-level
threads? If so, it won't work; Racket semaphores only work among Racket
threads, and you'd have to use OS-level semaphores to synchronize
OS-level threads.

If you were calling scheme_post_sema() from an OS thread where Racket
was started, though, then we need to investigate further.

At Wed, 14 Sep 2011 00:14:33 -0700, John Clements wrote:
> I'm unable to pass a semaphore to C and post to it from there. In particular, 
> it causes a seg fault. I'm testing the Scheme_Object * with SCHEME_SEMAP, so 
> I'm pretty sure it's a semaphore. Also, I can see this happen in gdb, but the 
> code is optimized, so it's hard to see exactly where it's failing. The 
> semaphore object looks like this in gdb:
> 
> Program received signal EXC_BAD_ACCESS, Could not access memory.
> Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000008
> [Switching to process 1825]
> scheme_post_sema (o=0x104a14668) at sema.c:284
> 284	
> (gdb) l    
> 279	
> 280	void scheme_post_sema(Scheme_Object *o)
> 281	{
> 282	  Scheme_Sema *t = (Scheme_Sema *)o;
> 283	  int v, consumed;
> 284	
> 285	  if (t->value < 0) return;
> 286	
> 287	  v = t->value + 1;
> 288	  if (v > t->value) {
> (gdb) p t
> $1 = (Scheme_Sema *) 0x104a14668
> (gdb) p t->value
> $2 = 0
> (gdb) p v
> Unable to access variable "v"
> $5 = <variable optimized away by compiler>
> (gdb) p *t
> $6 = {
>   so = {
>     type = 78, 
>     keyex = 0
>   }, 
>   first = 0x0, 
>   last = 0x0, 
>   value = 0
> }
> 
> The strange thing here is that the C code for scheme_sema_post suggests that 
> when t->first is 0x0, it should just silently return. Okay, so I dug into the 
> assembly a bit more, and it turns out that the compiled version of this code 
> looks like this:
> 
> Dump of assembler code for function scheme_post_sema:
> 0x000000010020e0d0 <scheme_post_sema+0>:	push   %rbp
> 0x000000010020e0d1 <scheme_post_sema+1>:	mov    %rsp,%rbp
> 0x000000010020e0d4 <scheme_post_sema+4>:	push   %r14
> 0x000000010020e0d6 <scheme_post_sema+6>:	push   %r13
> 0x000000010020e0d8 <scheme_post_sema+8>:	push   %r12
> 0x000000010020e0da <scheme_post_sema+10>:	push   %rbx
> 0x000000010020e0db <scheme_post_sema+11>:	sub    $0x30,%rsp
> 0x000000010020e0df <scheme_post_sema+15>:	mov    %rdi,-0x28(%rbp)
> 0x000000010020e0e3 <scheme_get_thread_local_variables+0>:	lea    
> 0x104cce(%rip),%r13        # 0x100312db8 <scheme_thread_local_offset>
> 0x000000010020e0ea <scheme_get_thread_local_variables+7>:	mov    
> 0x0(%r13),%edx
> 0x000000010020e0ee <scheme_get_thread_local_variables+11>:	lea    
> 0x12434b(%rip),%r14        # 0x100332440 <scheme_thread_local_key>
> 0x000000010020e0f5 <scheme_get_thread_local_variables+18>:	mov    
> (%r14),%eax
> 0x000000010020e0f8 <scheme_get_thread_local_variables+21>:	addr32 mov 
> %gs:(%edx,%eax,8),%rdx
> -- IT CRASHES ON THIS NEXT INSTRUCTION: --
> 0x000000010020e0fe <scheme_post_sema+46>:	mov    0x8(%rdx),%rax
> 0x000000010020e102 <scheme_post_sema+50>:	mov    %rax,-0x50(%rbp)
> 0x000000010020e106 <scheme_post_sema+54>:	lea    -0x50(%rbp),%rax
> 0x000000010020e10a <scheme_post_sema+58>:	mov    %rax,0x8(%rdx)
> 0x000000010020e10e <scheme_post_sema+62>:	lea    -0x28(%rbp),%rax
> 0x000000010020e112 <scheme_post_sema+66>:	mov    %rax,-0x40(%rbp)
> 0x000000010020e116 <scheme_post_sema+70>:	mov    0x18(%rdi),%rdx
> 0x000000010020e11a <scheme_post_sema+74>:	test   %rdx,%rdx
> 
> The problem on the given instruction is that %rdx is 0, and thus that loading 
> from an offset of 8 from 0x0 seg faults.
> 
> The gdb info makes it look as though this is an inlining of a function called 
> scheme_get_thread_local_variables, though I can't see why it would be called 
> here; the C code looks like it should just increment the counter and return.
> 
> As I said, this is completely and totally reproducible, so I'm happy to carry 
> out any experiments; at this point, I'm at the throwing up my hands and saying 
> "compiler bug?" stage.
> 
> Many thanks for any suggestions,
> 
> John
> 
> 
> ------------------------------------------------------------------------------
> [application/#f "smime.p7s"] [~/Desktop & open] [~/Temp & open]


Posted on the dev mailing list.