[racket] Regex for blank line?
Hi everyone,
I'm sure this is a really trivial question, but I've been trying on my
own for some time now, and I can't quite figure it out. I am trying to
define a pair of functions, skip-whitespace and skip-blank-line, that do
the following:
- skip-whitespace should consume any whitespace characters from an input
port, possibly up to and including a single newline, but it should not
consume any more whitespace after a newline--i.e., it should not skip a
blank line in the input
e.g.,
(define ip (open-input-string " ABC"))
(define ip2 (open-input-string " \n\t\nABC"))
(define ip3 (open-input-string "ABC"))
(skip-whitespace ip) (skip-whitespace ip2) (skip-whitespace ip3)
(peek-char ip) ; should be #\A
(peek-char ip2) ; should be #\tab
(peek-char ip3) ; should be #\A
- skip-blank-line should consume whitespace characters from an input
port just in case that sequence of whitespace characters ends in a
newline, and not consume any input otherwise
e.g.,
(define ip (open-input-string " ABC"))
(define ip2 (open-input-string " \n\t\nABC"))
(define ip3 (open-input-string "ABC"))
(skip-blank-line ip) (skip-blank-line ip2) (skip-blank-line ip3)
(peek-char ip) ; should be #\space
(peek-char ip2) ; should be #\tab
(peek-char ip3) ; should be #\A
Both functions should return a boolean value indicating whether any
input was consumed.
Here's what I've got for skip-whitespace:
#lang typed/racket
(: skip-whitespace (Input-Port -> Boolean))
(define (skip-whitespace in)
; matches whitespace up to and including a newline, but
; doesn't skip blank lines
(if (try-read #px"^[[:blank:]]*[[:space:]]?" in) #t #f))
; NOTE: try-read is a simple wrapper for regexp-try-match with type:
; (U String Regexp PRegexp) Input-Port -> (U String False)
This works fine. But I can't figure out how to write the parallel regexp
for skip-blank-line. All the regexps I can come up with either read too
much whitespace or too little.
#lang typed/racket
(: skip-blank-line (Input-Port -> Boolean))
(define (skip-blank-line in)
(if (try-read #px"^[[:blank:]]*$" in) #t #f))
This consumes too little in the second case: it doesn't consume the
initial spaces and newline of ip2; the next char is #\space rather than
#\tab. (The same is true if I change the character class :blank: to
:space:.)
If I change the regexp to #px"^[[:blank:]]*[[:space:]]", it consumes too
much in the first case: the next char of ip is #\A rather than #\space.
(I think this second regexp is closer to what I need, but what I could
really use is a character class that just matches line-terminators,
instead of :space:. That seems to be the job of "\\p{Zl}", but I guess
there's something I don't understand about that, because (regexp-match
#px"\\p{Zl}" "\n") doesn't match anything.)
I feel pretty lost here. Any help would be very much appreciated.
Thanks!
Richard