[racket-dev] I/O scheduler change --- epoll(), kqueue(), etc.

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Thu Nov 10 14:38:47 EST 2011

I've pushed a change in the way that the Racket scheduler handles I/O.
The new implementation should scale to a large number of network
connections, at least on Linux and BSDs (including Mac OS X).

Formerly, if a program had many threads each blocked on on I/O, then
the scheduler would poll each thread separately to find one to run. At
the OS level, that polling amounted to one select() per blocked thread.

Now, Racket internally suspends a thread that is blocked on I/O and
registers a callback via epoll()/kqueue() (where available). The
callback internally resumes specific threads when I/O becomes
available; meanwhile, the thread scheduler doesn't poll or otherwise
consider threads that are blocked on I/O.

Similar benefits apply when multiple I/O events are grouped together in
a `sync'. The old implementation polled each item in the `sync'
individually, and the new one uses epoll()/kqueue().

I'm interested to hear whether this change has any effect on real
uses of Racket.

One simple benchmark is below: it creates 100 blocked UDP sockets, each
with its own blocked thread, and then it trades 1 byte 10k times
between two threads via TCP (which forces constant thread swaps). Even
with only 100 blocked sockets, the new Racket is 2-3 times as fast on
my Mac OS X and Linux machines (the latter on VirtualBox). With 1000
blocked sockets, the new Racket takes about the same time as with 100,
while the old Racket takes longer than I'm willing to wait.

I've tried more realistic benchmarks, such as creating N TCP
connections, each with a blocked thread on the server side, and then
looping to accept M new connections (with a read followed by a close
for each new connection). The new Racket seems to scale for large N
while the old Racket doesn't.


#lang racket

(define N 100)
(define M 10000)

(define l (tcp-listen 40000 5 #t))

 (lambda ()
   ;; Generate stuck sockets:
   (for ([i (in-range N)])
     (define u (udp-open-socket))
     (udp-bind! u #f 0)
     (thread (lambda () (udp-receive! u (make-bytes 10)))))
   ;; Time communication (server side):
   (define-values (ci co) (tcp-accept l))
   (for ([i (in-range M)])
     (write-byte (modulo i 127) co)
     (flush-output co)
     (read-byte ci))))

;; Time communication (client side):
(define-values (ci co) (tcp-connect "localhost" 40000))
 (for ([i (in-range M)])
   (read-byte ci)
   (write-byte 7 co)
   (flush-output co)))

Posted on the dev mailing list.