[plt-scheme] intermittent "Connection reset by peer" with web server on mac

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Wed Jul 9 19:53:13 EDT 2008

At Wed, 09 Jul 2008 10:19:11 -0700, Simon Michael wrote:
> I worked through the systems programming tutorial again last night, with 
> mzscheme 4.0.2 on mac osx leopard, and observed intermittent connection 
> failures when making concurrent connections. eg this fails pretty often:
> 
>   ab -n 1000 -c 100 http://localhost:8080/

I can reproduce this on Leopard, so that I fairly regularly see

  apr_socket_recv: Connection reset by peer (54)

from `ab'.

Occasionally, I see the error that you reported on the server side:

>   === context ===
>   /Users/simon/src/serve.ss:36:0: handle
>   /Users/simon/src/serve.ss:23:12
> 
>   regexp-match: expects type <string, byte string, or input port> as 2nd 
>   argument, given: #<eof>; other arguments were: #rx"^GET (.+) 
>   HTTP/[0-9]+\\.[0-9]+"

but that seems to be a result of `ab' terminating (due to the other error).


As far as I can tell, the source of the "Connection reset by peer"
problem is actually in the OS:

I reduced the server to just `tcp-accept' (don't read, don't close,
etc.), and I even hacked `tcp-accept' to immediately return after the
accept() call. With those changes, I could get the "Connection reset by
peer" error with `ab -n 40 -c 20'.

But if I replace the 5 passed to `tcp-listen' with 100 or more ---
making the TCP listener "backlog" larger than the number of attempted
concurrent connections --- then I'm unable to trigger the "Connection
reset by peer" error in the original server and `ab' configuration. If
I then raise the `ab' concurrency to 200, the errors come back.
(According to system headers, the maximum backlog value in Leopard is
128, so it doesn't help to pass a larger value to `tcp-listen'.)

I rarely hit OS bugs at this level, and I haven't yet tried to create
small C programs to demonstrate the problem. Still, as far as I can
tell, listen()/accept() is not working right in Mac OS X (I can't
reproduce the problem in Linux), and a workaround is to raise the
backlog value.


Matthew



Posted on the users mailing list.