[plt-scheme] 299.8

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Tue May 18 17:02:05 EDT 2004

The v299-tagged code for MzScheme and MrEd in CVS is now version 299.8.
(The exp-tagged coded remains version 207.)


Concurrent I/O (mostly I)
--------------

This version makes port I/O cooperate properly with MzScheme's version
of Concurrent ML primitives.

For example, if you want to read ten characters from either port a or
port b (whichever is ready first, pseudo-random choice if both are
ready), you can write

  (require (lib "port.ss"))

  (sync
   (read-string-evt 10 a)
   (read-string-evt 10 b))

Probably you want to do something different, depending on which port
produces the bytes, and then loop:

   (let loop ()
     (sync
      (finish-evt (read-string-evt 10 a)
                  (lambda (v)
                    (printf "From a: ~a~n" v)
                    (loop)))
      (finish-evt (read-string-evt 10 b)
                  (lambda (v)
                    (printf "From b: ~a~n" v)
                    (loop)))))

Or, suppose that you want to poll `p' to check whether it's ready to
deliver a "Hello":

 (sync/timeout 0 (regexp-match-evt #rx"^Hello" p))
  ; => #f or '(#"Hello")

Or, suppose that you want to read one line of input, but only if it's
available within 3 seconds:

 (sync/timeout 3 (read-line-evt p))


All of these events come with the guarantee that bytes are read from
the port if and only if the event is chosen in a sync. So, if at first
you don't succeed...

 ;; This loop never accidentally drops an input line:
 (let loop ()
   (or (sync/timeout 3 (finish-evt
                        (read-line-evt p)
                        (lambda (s)
                          ... do something with s ...)))
       (begin
         (printf "Anyone out there?~n")
         (loop))))

Furthermore, when multiple bytes are read, they correspond to
consecutive bytes in the stream. So, for example

 (let* ([p (open-input-string "x=1 y=2")]
        [show-one (lambda ()
                    (let ([m (sync (regexp-match-evt #rx"(.)=(.)" p))])
                      (printf "~a equals ~a~n"
                              (cadr m) (caddr m))))])
   (thread show-one)
   (thread show-one))

may print "x equals 1" first or "y equals 2" first, but it won't print
"x equals 2".

A more practical example: Suppose that multiple threads are reading
from a port with lots of multi-byte encodings. With `read-char' or
`read-string', there's no guarantee that each threads gets a
consecutive sequence of bytes, so the byte-to-char decoding can get
mangled as the threads receive interleaved bytes. In contrast, using
`read-string-evt' guarantees that the decoding corresponds to a
consecutive sequence of bytes in the stream. (Possibly `read-string'
should be re-implemented in terms of `read-string-evt', but the current
`read-string' is a lot faster in the single-threaded case.)


All of this functionality is built (in principle) on a small collection
of input-port primitives:

 (port-progress-evt input-port) -> progress-evt
   - Produces an that becomes ready with any subsequent non-peek read
     on `input-port'.

 (peek-bytes-avail! mutable-bytes skip-k progress-evt input-port) -> got-k
 (peek-bytes-avail!* mutable-bytes skip-k progress-evt input-port) -> got-k
   - Get as many bytes as are available, where "*" means "zero is ok.
     The `*' could be turned into an argument to make it one primitive. :)

     The `progress-evt' argument is new, and it must be either #f or
     the result of `port-progress-evt' on `input-port'. If
     `progress-evt' becomes ready, then nothng should be peeked
     (because someone else grabbed bytes, probably invalidating a peek
     sequence).

     Note that "do X only while Y isn't ready" is not the sort of thing
     you can normally do in CML, because there would be a race
     condition between checking Y and doing X. The important thing here
     is that the input port itself generated `progress-evt', so the port
     can arrange to do X only if Y isn't ready.

  (port-commit-peeked k progress-evt evt input-port) -> boolean
    - Commits k previously-peeked bytes as read, but only at a
      successful sync on evt, and only if `progress-evt' doesn't become
      ready first (which would indicate that some other progress
      grabbed the bytes); the result is #t if the commit succeeds, #f
      otherwise; in either case, `progress-evt' is ready when the
      procedure returns.

      Here, again, it's crucial that `input-port' generated
      `progress-evt'.

      Note that this commit operation doesn't report the bytes that
      were committed; they've been peeked before, and `progress-evt'
      ensures that the commit is consistent with the peeks.

In general, you can use these primitives to implement any sort of
look-ahead parser so that it cooperates with concurrent parsers on the
same stream.

Although all other input-port functions can be implemented in terms of
these, MzScheme will retain versions of `read-bytes', etc. that are
optimized for common paths.


Caveat: When a port is based on an OS-level stream, peeking from the
Scheme port requires reading from the OS-level stream. If the stream is
shared with other OS-level processes, the other processes can't get the
peeked bytes. So peeking and CML I/O only work nicely when you stay
inside a single instance of MzScheme.


On the output side, MzScheme provides `write-bytes-avail-evt', where
bytes are written if and only if the event is chosen in a sync. I think
there's not much more that MzScheme can do for output; OS limitations
are more significant in that direction.


Inside MzScheme (changes are mostly from 299.7)
---------------

A structure that represents a Scheme type should now start with a
Scheme_Object, instead of Scheme_Type. A Scheme_Object contains only a
Scheme_Type (except in 3m mode), so it takes the same amount of space
as before. But using Scheme_Object instead of Scheme_Type ensures that
casts to and from Scheme_Object* do not run afoul of C99's aliasing
assumptions.

The error_buf field of Scheme_Thread is now a pointer to a mz_jmp_buf,
instead of an inlined mz_jmp_buf. The protocol for temporarily
catching an exception is now as follows:

  mz_jmp_buf *save, fresh;
  save = scheme_current_thread->error_buf;
  scheme_current_thread->error_buf = &fresh;
  if (scheme_setjmp(scheme_error_buf)) {
    /* There was an error or continuation invocation */
    if (scheme_jumping_to_continuation) {
      /* It was a continuation jump */
      scheme_longjmp(*save, 1);
      /* To block the jump, instead: scheme_clear_escape(); */
    } else {
      /* It was a primitive error escape */
    }
  } else {
    /* Whatever might escape. */
    ....
  }
  scheme_current_thread->error_buf = save;

The input and output port driver interfaces have changed to accomodate
progress events and commits (for input ports) and write events (for
output ports). For most port types, the new features can be
implemented automatically by MzScheme with a small amount of extra
work in the driver.


Etc.
----

 * Added an `err-char' argument to `bytes-utf-8-length',
   `bytes-utf-8-index', and `bytes-utf-8-ref'. The new optional
   argument is before the optional `start-k' argument.

 * Changed `make-input-port' again, replacing peek and read events
   with progress events and commits.

 * Removed `read-bytes-avail!-evt' and `peek-bytes-avail!-evt'.

 * Added `port-progress-evt', `port-commit-peeked', and
  `port-provides-progress-evts?'.

 * Added an optional progress-event argument to `peek-bytes-avail!',
   etc., before the port argument.

 * Added an optional progress-event argument to `regexp-match-peek' and
   `regexp-match-peek-positions'. Also added non-blocking
   `regexp-match-peek-immediate' and
   `regexp-match-peek-positions-immediate'.

 * Peeked bytes in a limited pipe do not count against the pipe's
   limit, which makes pipes work naturally with `regexp-match-evt',
   etc. while preserving the pipe's rate-limiting behavior.


Updated temporary docs:

  http://www.cs.utah.edu/~mflatt/tmp/mzscheme-doc.plt
  http://www.cs.utah.edu/~mflatt/tmp/mzlib-doc.plt
  http://www.cs.utah.edu/~mflatt/tmp/insidemz-doc.plt


Matthew



Posted on the users mailing list.