[racket] Some design "whys" of regexps in Racket

From: Rodolfo Carvalho (rhcarvalho at gmail.com)
Date: Fri Jun 3 23:39:56 EDT 2011

On Sat, Jun 4, 2011 at 00:24, Matthew Flatt <mflatt at cs.utah.edu> wrote:

> At Sat, 4 Jun 2011 00:09:40 -0300, Rodolfo Carvalho wrote:
> > Eli says that
> >
> > (BTW, Racket's solution is something that is done in many other
> > > languages too.)
> >
> >
> >
> > I come from Python where I can write
> >
> > >>> re.findall("\d{2}", "06/03/2011")
> > ['06', '03', '20', '11']
> >
> > And printing the string that I used for my regexp gives:
> >
> > >>> print "\d{2}"
> > \d{2}
>
> Isn't that only because "\d" isn't an escape in strings? While Racket
> complains about a "\" that doesn't form an escape sequence, Python
> treats the "\" as a literal (while Ruby effectively ignores the "\").
>
> Compare to the Python example
>
>  >>> re.findall("a\b", "a ")
>  []
>  >>> re.findall("a\\b", "a ")
>  ['a']
>
> Since "\b" is an escape that means ASCII 8, to get a backslash followed
> by a "b" in a regexp (to indicate a word boundary), you need to use
> "\\b".
>
>

Yeah... thinking like this makes Python feel a bit more complex to reason
about.
"\d" doesn't mean anything in the "string-world" while "\b" does, and
therefore needs to have the backslash escaped.


Wait wait!
However, Python strings have a special raw-mode (raw-strings), suitable for
writing regexps...


So I've been "lying" all the time. I was not used to write "\d{2}", but
actually the raw-string r"\d{2}".
Using a raw-string makes Matthew's examples work as follows:


>>> re.findall(r"a\b", "a ")
['a']

>>> re.findall(r"a\\b", "a ")
[]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20110604/ce93fe6b/attachment.html>

Posted on the users mailing list.