[racket] Running into severe scaling issues with plt-web-server
Followup: no, raising the listen backlog unfortunately didn't help either.
But the good news is that I've finally found the true cause of the
problem. It's not Racket's fault wescheme.org's compiler servers
"fail" on load spikes: rather, it's Amazon EC2. Specifically,
Amazon's Elastic Load Balancer will raise 503 errors even if the
servers aren't at capacity. It's documented that the Amazon load
balancers will raise 503s on traffic spikes, as their load balancers
"warm up".
Here's what they say:
---
Elastic Load Balancing Capacity Limits Reached
Elastic Load Balancing will likely never reach true capacity limits,
but until it scales based on the metrics, there can be periods in
which your load balancer will return an HTTP 503 error when it cannot
handle any more requests. The load balancers do not try to queue all
requests, so if they are at capacity, additional requests will fail.
If traffic grows over time, then this behavior works well, but in the
case of significant spikes in traffic or in certain load testing
scenarios, the traffic may be sent to your load balancer at a rate
that increases faster than Elastic Load Balancing can scale to meet
it.
---
Reference: http://aws.amazon.com/articles/1636185810492479
This is precisely what I've been seeing. I'm mortified; that doesn't
sound like "load balancing" to me, but I have to work with what I've
got. So thanks Jay, sorry about the false alarm. I'm working around
the problem now by modifying the client code to expect 503s.