[racket-dev] no capturing groups in regexp-split? [was Re: [PATCH] add regexp-split]
Yesterday, Marijn wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> this just appeared on guile-devel, but it seems to have exposed a bug
> in racket.
>
> On 29-12-11 10:32, Nala Ginrut wrote:
> > [...]
This doesn't look like an issue that is related to guile, just that he
chose python as the goal... The first other random example I tried
was `split-string' in Emacs, which did the same thing as Racket.
> Welcome to Racket v5.2.0.7.
> > (regexp-split "([^0-9])" "123+456*/")
> '("123" "456" "" "")
>
> should it be considered a bug in racket that it doesn't support
> capturing groups in regexp-split?
No.
> Without the capturing group the results are identical: [...]
Which is expected.
> >>> import re re.split("[^0-9]", "123+456*/")
> ['123', '456', '', '']
>
> > (regexp-split "[^0-9]" "123+456*/")
> '("123" "456" "" "")
It was tricky to dig out what you wanted here... Python does
something which is IMO very weird:
>>> re.split("([^0-9])", "123+456*/")
['123', '+', '456', '*', '', '/', '']
It's even more confusing with multiple patterns:
>>> re.split("([^0-9]([0-9]))", "123+456*/")
['123', '+4', '4', '56*/']
There's probably uses for that -- at least for the simple version with
a single group around the whole regexp, but that's some hybrid of
`regexp-split' and `regexp-match*': it returns something that
interlevase them, which can be useful, but I'd rather see it with a
different name.
We've talked semi-recently about adding an option to `regexp-match*'
so it can return the lists of matches for each pattern, perhaps add
another option for returning the unmatched sequences between them, and
give the whole thing a new name? (Something that indicates it being
the multitool version of all of these.)
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!