[racket] Regex for blank line?

From: Richard Lawrence (richard.lawrence at berkeley.edu)
Date: Wed Jun 8 14:13:51 EDT 2011

Hi everyone,

I'm sure this is a really trivial question, but I've been trying on my
own for some time now, and I can't quite figure it out.  I am trying to
define a pair of functions, skip-whitespace and skip-blank-line, that do
the following:

- skip-whitespace should consume any whitespace characters from an input
  port, possibly up to and including a single newline, but it should not
  consume any more whitespace after a newline--i.e., it should not skip a
  blank line in the input

e.g., 
(define ip (open-input-string "  ABC")) 
(define ip2 (open-input-string "  \n\t\nABC"))
(define ip3 (open-input-string "ABC"))
(skip-whitespace ip) (skip-whitespace ip2) (skip-whitespace ip3)
(peek-char ip) ; should be #\A
(peek-char ip2) ; should be #\tab
(peek-char ip3) ; should be #\A

- skip-blank-line should consume whitespace characters from an input
  port just in case that sequence of whitespace characters ends in a
  newline, and not consume any input otherwise

e.g.,
(define ip (open-input-string "  ABC")) 
(define ip2 (open-input-string "  \n\t\nABC"))
(define ip3 (open-input-string "ABC"))
(skip-blank-line ip) (skip-blank-line ip2) (skip-blank-line ip3)
(peek-char ip) ; should be #\space
(peek-char ip2) ; should be #\tab
(peek-char ip3) ; should be #\A

Both functions should return a boolean value indicating whether any
input was consumed.

Here's what I've got for skip-whitespace: 

#lang typed/racket
(: skip-whitespace (Input-Port -> Boolean))
(define (skip-whitespace in)
  ; matches whitespace up to and including a newline, but 
  ; doesn't skip blank lines
  (if (try-read #px"^[[:blank:]]*[[:space:]]?" in) #t #f))

; NOTE: try-read is a simple wrapper for regexp-try-match with type:
; (U String Regexp PRegexp) Input-Port -> (U String False)

This works fine. But I can't figure out how to write the parallel regexp
for skip-blank-line.  All the regexps I can come up with either read too
much whitespace or too little.

#lang typed/racket
(: skip-blank-line (Input-Port -> Boolean))
(define (skip-blank-line in)
  (if (try-read #px"^[[:blank:]]*$" in) #t #f))

This consumes too little in the second case: it doesn't consume the
initial spaces and newline of ip2; the next char is #\space rather than
#\tab.  (The same is true if I change the character class :blank: to
:space:.)

If I change the regexp to #px"^[[:blank:]]*[[:space:]]", it consumes too
much in the first case:  the next char of ip is #\A rather than #\space.

(I think this second regexp is closer to what I need, but what I could
really use is a character class that just matches line-terminators,
instead of :space:.  That seems to be the job of "\\p{Zl}", but I guess
there's something I don't understand about that, because (regexp-match
#px"\\p{Zl}" "\n") doesn't match anything.)

I feel pretty lost here.  Any help would be very much appreciated.

Thanks!

Richard





Posted on the users mailing list.