[racket] Regexp-match question: ? means what, or .*? means what...

From: JP Verkamp (racket at jverkamp.com)
Date: Wed May 8 22:21:48 EDT 2013

> ;This I understand:
> ;To return pattern and everything to left of first instance of pattern,
> use: [^<pattern>]*<pattern> ;where ^ means 'not'; example:
> (car (regexp-match* #rx"[^/]*/" "12/4/6")) ; => "12/"

A caveat for this. <pattern> in the first instance isn't quite right. [...]
defines a single character. [abc] for example, means a, b, or c. The ^
negates the pattern, so [^abc] means anything a, b, or c. Then * means
match the proceeding pattern 0 or more times. So [^abc]* means any number
of a, b, or c in any order (if you want repeating "abc", you want
paranthesis: #rx"(abc)*"). The entire point though is that if you wanted to
match something like this: "12abc34abc6abc7", using #rx"[^abc]*abc"
wouldn't work. Instead of 'not abc than abc', you're getting 'anything not
a or b or c then abc' You'd need something a bit more complicated.

> ;This I do not understand:
> ;To return pattern and everything left of first instance of pattern, use:
> .*?<pattern> inside #rx""; where ? means _______; or where .*? means
> ______________; example:
> (car (regexp-match* #rx".*?/" "12/4/6")) ; => "12/"

Technically, the interesting part there isn't .*? or just ?, but rather *?
A single dot means to match any character. .* means to match any number of
characters. The problem with that is * by default is greedy. It will match
the longest sequence it can. Since . is anything, the pattern #rx".*/" on
"12/4/6" will match 'the longest string of any character followed by a /'
that it can: "12/4/". If you instead want a non-greedy (lazy) match, use *?
or +? instead of * or +. That says, match the proceeding pattern until the
next pattern matches. So it will match 'any character until a / is seen',
so "12/".

It doesn't particularly help that ? also has a meaning by itself. It makes
the previous pattern optional. So #rx"x?y" matches "y" or "xy".

Regular expressions are pretty amazingly powerful. This is a site I've
found useful in the past for looking up what things mean:

Specifically for more information on repeating patterns (* + *? +?):

Hope that helps!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130508/507905e7/attachment-0001.html>

Posted on the users mailing list.