[racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]

From: Eli Barzilay (eli at barzilay.org)
Date: Fri Dec 30 03:32:08 EST 2011

Yesterday, Marijn wrote:
> Hash: SHA1
> Hi,
> this just appeared on guile-devel, but it seems to have exposed a bug
> in racket.
> On 29-12-11 10:32, Nala Ginrut wrote:
> > [...]

This doesn't look like an issue that is related to guile, just that he
chose python as the goal...  The first other random example I tried
was `split-string' in Emacs, which did the same thing as Racket.

> Welcome to Racket v5.2.0.7.
> > (regexp-split "([^0-9])"  "123+456*/")
> '("123" "456" "" "")
> should it be considered a bug in racket that it doesn't support
> capturing groups in regexp-split?


> Without the capturing group the results are identical: [...]

Which is expected.

> >>> import re re.split("[^0-9]", "123+456*/")
> ['123', '456', '', '']
> > (regexp-split "[^0-9]"  "123+456*/")
> '("123" "456" "" "")

It was tricky to dig out what you wanted here...  Python does
something which is IMO very weird:

  >>> re.split("([^0-9])", "123+456*/")
  ['123', '+', '456', '*', '', '/', '']

It's even more confusing with multiple patterns:

  >>> re.split("([^0-9]([0-9]))", "123+456*/")
  ['123', '+4', '4', '56*/']

There's probably uses for that -- at least for the simple version with
a single group around the whole regexp, but that's some hybrid of
`regexp-split' and `regexp-match*': it returns something that
interlevase them, which can be useful, but I'd rather see it with a
different name.

We've talked semi-recently about adding an option to `regexp-match*'
so it can return the lists of matches for each pattern, perhaps add
another option for returning the unmatched sequences between them, and
give the whole thing a new name?  (Something that indicates it being
the multitool version of all of these.)

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.