[racket-dev] Missing pregexp syntax in Racket

From: ozzloy-racket-dev (ozzloy+dev_racket-lang_org at gmail.com)
Date: Mon Nov 28 01:51:17 EST 2011

i'll take a stab at clarifying.
there was some discussion on irc about being able to represent regexpes
using less escaping.
it was suggested that we could have something similar to perl, ruby,
javascript and other languages for specifying a regexp pattern, something
like #rx/pattern/
so for example you could type
(regexp-replace* #rx/\bregex\b/ text "regexp")
instead of
(regexp-replace* #px"\\bregex\\b" text "regexp")

it turns out there are some patterns that need to be added to racket before
we could do something like that.  creating a pattern that matches a newline
involves writing a pattern that has a newline in it, rather than a pattern
with a backslash followed by an 'n'.

(regexp-match? #rx"\n" "\n") => #t
(regexp-match? #px"\n" "\n") => #t
(regexp-match? #rx"\\n" "\n") => #f
(regexp-match? #rx"\\n" "n") => #t
(regexp-match? #px"\\n" "\n") => raises "read: bad pregexp string: illegal
alphabetic escape

pauan went through and found all such missing escapes and reported them.

On Mon, Nov 28, 2011 at 00:48, David T. Pierson <dtp at mindstory.com> wrote:

> On Sat, Nov 26, 2011 at 10:35:57PM -0800, Pauan wrote:
> > It was brought up that my explanation was confusing, and I agree it is.
> > So I'll try again. The following should return #t:
> ...
> > (regexp-match? "\\n"        "\n")
>
> I am confused about a number of things in your emails, so for simplicity
> I'm focusing on the above expression only.
>
> Your email subject mentions pregexp, but your pattern is a string
> literal, which AFAICT will be compiled into a regular expression using
> regexp not pregexp.  Therefore it isn't clear whether you are suggesting
> a change to regexp syntax or pregexp syntax.
>
> Currently, #rx"\\n" matches like #rx"n" and #px"\\n" is an error:
>
> > (regexp-match? #rx"\\n" "n")
> #t
> > #px"\\n"
> readline-input::569: read: bad pregexp string: illegal alphabetic escape
>
> Both of these behaviors agree with the syntax documented at
>
>
> http://docs.racket-lang.org/reference/regexp.html?q=regexp-match%3F#(part._regexp-syntax)
>
> To add to my confusion, your original email mentioned #px"\\\\n", which
> currently matches a backslash followed by an 'n'.
>
> Perhaps you are suggesting that #px"\\n" should mean the same as
> #px"\n" rather than being an error?  I don't see a need for this but
> perhaps you have a rationale in mind?
>
> David
> _________________________________________________
>  For list-related administrative tasks:
>  http://lists.racket-lang.org/listinfo/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20111128/56db557e/attachment.html>

Posted on the dev mailing list.