[racket-dev] `regexp-explode' etc + poll

From: Eli Barzilay (eli at barzilay.org)
Date: Sun Jan 1 13:07:33 EST 2012

I've implemented a new `regexp-explode' function.  It accepts the same
arguments as `regexp-match*' and `regexp-split', but with two
additional keyword arguments:

  * #:select-match

    If this is #t (the default) then the result includes the lists of
    results from the sub-matches.  It can also be #f to not include
    them, and it can be a "selector function" that chooses a specific
    one (eg, `car' etc) or return a different list of matches (eg,
    `cdr').

  * #:select-gap

    This is just a boolean flag -- if it's #t (the default), the
    strings between the matches are returned as well -- interleaved
    with the (lists of) matches, otherwise they're omitted.

So by default, you get the information that `regexp-split' returns,
interleaved with the full results of matching.  Examples:

  -> (regexp-explode #rx"[^0-9]([^0-9])?" "0+1.*2")
  '("0" ("+" #f) "1" (".*" "*") "2")
  -> (regexp-explode #rx"[^0-9]([^0-9])?" "0+1.*2"
                     #:select-match car #:select-gap #f)
  '("+" ".*")
  -> (regexp-explode #rx"[^0-9]([^0-9])?" "0+1.*2"
                     #:select-match cadr)
  '("0" #f "1" "*" "2")

*** Minor poll: I'm not too happy with that `select-gap' name.  Any
    suggestions for a better name?

But the obvious next function to implement,
`regexp-explode-positions', complicated things a little.  The thing is
that there's no point in having it have the same interface -- the gaps
are useless there since they're easily inferred from the matches (as
seen by the lack of a `regexp-split-positions' function).  So, a
possible alternative that I thought about is to add a `#:select-match'
keyword to `regexp-match-positions*' instead, so it can return the
list of position matches in a similar way.  However, that would lead
to another problem: it would be bad to have a keyword argument only
for `regexp-match-positions*' which is not accepted by
`regexp-match*'.  So a solution to that is to add it to
`regexp-match*' too, but then there's little point in
`regexp-explode'...

So the options that I see are:

1. Drop the new `regexp-explode' name, and instead have this
   functionality folded into `regexp-match*', which will get the two
   new keywords with a default of #f for `#:select-gap', and `car' for
   `#:select-match'.  Similarly Add `#:select-match' to
   `regexp-match-positions*', but not `#:selet-gap'.

1a. Minor variation: insist on uniformity, and include a
    `#:select-gap' keyword for `regexp-match-positions*' too.

2. Same as #1, but also have `regexp-explode', which is now the same
   as `regexp-match*' but with different defaults for the two
   keywords.

2a. Same variation for #1a.

3. Do not extend the interface of existing functions -- have only the
   new `regexp-explode' have the added functionality.  For the
   positions version, add a `regexp-explode-positions', without a
   `#:select-gap' keyword.  The possible advantage here is that the
   (already complicated) output type of `regexp-match*' stays the
   same, and `regexp-explode' gets the much more complicated one.

3a. Same as #3, but with `#:select-gap' for
    `regexp-explode-positions'.

I'm now leaning towards #1.  Any votes for other options, or maybe
something different?

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.