[racket] handin-server hanging

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Sep 28 19:40:34 EDT 2010

Random points:

* The "cleaning up" log message doesn't mean much -- it happens every
  few minutes if there were any submissions.  Adding log lines in
  random places (including at the end of a cleanup) is probably a good
  idea to see where the problem is.  (And if it's a problem, it's
  possible to disable the cleanup, but the directory structure will
  become more complex -- I can give you the information if needed.)

* If the process gets stuck this way, then it's in some deadlock,
  probably related to the submission process.  To abort it more
  conveniently, you can use Ctrl+\ on a linux terminal.

* JFYI, the highest number of students that we've had here was about
  120 -- so you're definitely pushing more limits.  The server should
  generally be fine though -- it uses `run-server' which looks like
  it's hard-wired to 5 simultaneous connections.  Still, it's probably
  a good idea to keep an eye on the process and see that it's not
  growing up to a thrashing point.

* Especially relevant to look at the process when it gets stuck.  A
  very convenient linux utility for this is `htop' -- see the memory
  and the state of the process, and also hit "s" and it will connect
  to the process with strace and show you the system calls it's doing
  (if any).

* Also, running it on NFS is probably not a great idea -- some NFSs
  can have subtle behavior with many operations.  I think that it's
  better to run it on a local filesystem, with a cron job to copy it
  to the NFS for backup.


10 minutes ago, Ryan Golbeck wrote:
> Has anyone had any problems with their handin-server hanging?
> 
> Twice recently we've had our handin-server seemingly hang.  The last
> log message on both hangs was that it was "Cleaning up all
> submission directories".  I've added new log messages to determine
> if it ever finishes cleaning the submission directories should this
> happen again, but I haven't been able to reproduce the hang easily.
> We have around 400 students in the class using the handin-server, so
> the number of submissions is adding up quickly.  The server is
> running on a host where the submission directories are served over
> NFS, which does seem to be slow in some cases, and could be a source
> of Racket hanging for some reason.
> 
> The process wouldn't respond to a Ctrl+C interrupt on the terminal it
> was running when it was hung; I had to send it a signal using kill to
> get it to stop.
> 
> Anyone have any suggestions on where I should look for this?  I'll try
> to post more information as soon as I get it.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the users mailing list.