[racket] handin-server hanging
Random points:
* The "cleaning up" log message doesn't mean much -- it happens every
few minutes if there were any submissions. Adding log lines in
random places (including at the end of a cleanup) is probably a good
idea to see where the problem is. (And if it's a problem, it's
possible to disable the cleanup, but the directory structure will
become more complex -- I can give you the information if needed.)
* If the process gets stuck this way, then it's in some deadlock,
probably related to the submission process. To abort it more
conveniently, you can use Ctrl+\ on a linux terminal.
* JFYI, the highest number of students that we've had here was about
120 -- so you're definitely pushing more limits. The server should
generally be fine though -- it uses `run-server' which looks like
it's hard-wired to 5 simultaneous connections. Still, it's probably
a good idea to keep an eye on the process and see that it's not
growing up to a thrashing point.
* Especially relevant to look at the process when it gets stuck. A
very convenient linux utility for this is `htop' -- see the memory
and the state of the process, and also hit "s" and it will connect
to the process with strace and show you the system calls it's doing
(if any).
* Also, running it on NFS is probably not a great idea -- some NFSs
can have subtle behavior with many operations. I think that it's
better to run it on a local filesystem, with a cron job to copy it
to the NFS for backup.
10 minutes ago, Ryan Golbeck wrote:
> Has anyone had any problems with their handin-server hanging?
>
> Twice recently we've had our handin-server seemingly hang. The last
> log message on both hangs was that it was "Cleaning up all
> submission directories". I've added new log messages to determine
> if it ever finishes cleaning the submission directories should this
> happen again, but I haven't been able to reproduce the hang easily.
> We have around 400 students in the class using the handin-server, so
> the number of submissions is adding up quickly. The server is
> running on a host where the submission directories are served over
> NFS, which does seem to be slow in some cases, and could be a source
> of Racket hanging for some reason.
>
> The process wouldn't respond to a Ctrl+C interrupt on the terminal it
> was running when it was hung; I had to send it a signal using kill to
> get it to stop.
>
> Anyone have any suggestions on where I should look for this? I'll try
> to post more information as soon as I get it.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!