[plt-scheme] Checking if strings are only whitespace

From: Matthew Flatt (mflatt at cs.utah.edu)
Date: Tue Mar 21 09:14:32 EST 2006

At 20 Mar 2006 18:31:05 +0000, Alexander Schmolck wrote:
> Robby Findler <robby at cs.uchicago.edu> writes:
> 
> > (BTW, Matthew's use of the regexp library was better -- less allocation
> > and a better regexp.)
> 
> The regexp solutions so far certainly don't handle non-ascii whitespace
> (and what about vertical tabs?).
> 
> Wouldn't #rx"^\s+$" just work?

The #rx notation doesn't currently support things like "\s", though it
would be a sensible addition.


For the record, there are some potentially surprising performance
implications with the current implementation for string regexps. To
include all Unicode whitespace right now, you could write

  (require (lib "14.ss" "srfi"))
  (define rx:ws
    (regexp (format "^[~a]+$" (char-set->string char-set:whitespace))))

  (define (is-space? str)
    (and (regexp-match-positions rx:ws str) #t))

It turns out that this much slower than just matching ASCII whitespace,
because the regexp is turned into a byte-string-based regexp to match
UTF-8 encodings of whitespace characters.

PCRE probably performs better, though I note that it uses the same
UTF-8-based strategy for Unicode.

Matthew



Posted on the users mailing list.