[plt-scheme] Small problem with regex-replace*

From: Danny Yoo (dyoo at hkn.eecs.berkeley.edu)
Date: Fri Jun 16 02:09:23 EDT 2006

> I was happy this was a very small program to write, however I wasn't able to 
> create a regexp that only match words,
> so I ended using *dummy* variable to hold the second match that is passed to 
> the scramble procedure. How can I fix this?


Hi Jaime,

The issue is that the regular expression has a group, which we define by 
putting parens in the regexp pattern:

     (let ([word (regexp "([a-zA-Z])+")]) ...)

Here, any matches on our regexp will return two results: one for the whole 
match, and the others for each group in the pattern.

What you probably want is:

     [a-zA-Z]+

which doesn't define a group, so it'll give us a single reported match.


Alternatively, we can look at the documentation on regex patterns:

http://pre.plt-scheme.org/docs/html/mzscheme/mzscheme-Z-H-10.html#node_chap_10

and see the following snippet:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Atom     ::= (Regexp)                 Match sub-expression Regexp and
                                       report match
           |  (?:Regexp)               Match sub-expression Regexp 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


So there's a way of defining a sub-expression regex without causing match 
to report it.


Let's play with regexp-match just to make this concrete:

;;;;;;
> (regexp-match "[a-zA-Z]+" "hello world")
("hello")
> (regexp-match "([a-zA-Z]+)" "hello world")
("hello" "hello")
> (regexp-match "([a-zA-Z])+" "hello world")
("hello" "o")
> (regexp-match "(?:[a-zA-Z]+)" "hello world")
("hello")
;;;;;;

(The result for "([a-zA-Z])+" surprises me!)


Finaly, the scrambling code might need some clarification.  What happens 
on words with only one letter in them?


Best of wishes to you!


Posted on the users mailing list.