[racket-dev] regexp.c and lookahead

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Sun Jun 15 03:40:05 EDT 2014

At Sat, 14 Jun 2014 18:18:05 -0400, Tony Garnock-Jones wrote:
> At the moment, when regexp.c runs out of buffered lookahead during a
> regexp-try-match, it peeks a few bytes. However, it looks like it will
> never peek *fewer* than 16 bytes (unless eof occurs before then).

I don't think that's right:

 (define-values (i o) (make-pipe))
 (write-bytes #"abcd" o) ; note: `o` is not closed
 (regexp-try-match #rx"^a" i)
 ; => '(#"a")

Internally the regexp-matching functions call
scheme_get_byte_string_unless() with a 6th argument of 1, which
corresponds to `peek-bytes-avail!`.

The call will request at least 16 bytes on each peek, but the matcher
will accept a single byte to try to make progress.


> I have written the package "incremental-input" which lets a blocking
> read (e.g. read-json) be fed input as it becomes available, event-style.
> 
> When testing using read-json from the "json" collect, I find that it
> blocks unnecessarily even though a complete input is available.

I think the problem is in your port implementation. Your
`incremental-read-bytes!` tries to block (and emit a message) instead
of returning a result to indicate that no more input is ready, and that
doesn't work in larger combinations. Since you don't supply a "peek"
function for the port, the immediate combination is that your port's
"read" function is is used to implement peeks. A port's read function
really needs to be non-blocking.

You can make your tests pass most of the time(!) by changing

       [(queue-empty? ports)
        (suspend)
        (retry)]

to

       [(queue-empty? ports)
        (cond
         [(zero? (random 100))
          (suspend)
          (retry)]
         [else 0])]

and that's obviously a hack, but it should illustrate that regexp
matching can be happy to work with the bytes that it has been given ---
if a port properly reports that no more bytes are available.


I'm not sure I understand your overall goal, but it seems like you're
tying to implement `read-json-evt` in terms of `read-json`, or more
generally implement R`-evt` in terms of R. Is there a reason you can't
just call R in a separate thread and wrap that attempt up as an event?

I also suspect that you want a poll operation that reliably fails if
progress is not possible until something more is done externally.
Racket's concurrency system supports that concept; see
`poll-guard-evt`.


Posted on the dev mailing list.