[racket-dev] `regexp-explode' etc + poll

From: Eli Barzilay (eli at barzilay.org)
Date: Tue Mar 13 10:57:51 EDT 2012

This is now pushed: `regexp-match*' has `#:match-select' and
`#:gap-select' keyword arguments; and `regexp-match-positions*' and
`regexp-match-peek-positions*' have `#:match-select'.


On January 1st, Eli Barzilay wrote:
> I've implemented a new `regexp-explode' function.  It accepts the same
> arguments as `regexp-match*' and `regexp-split', but with two
> additional keyword arguments:
> 
>   * #:select-match
> 
>     If this is #t (the default) then the result includes the lists of
>     results from the sub-matches.  It can also be #f to not include
>     them, and it can be a "selector function" that chooses a specific
>     one (eg, `car' etc) or return a different list of matches (eg,
>     `cdr').
> 
>   * #:select-gap
> 
>     This is just a boolean flag -- if it's #t (the default), the
>     strings between the matches are returned as well -- interleaved
>     with the (lists of) matches, otherwise they're omitted.
> 
> So by default, you get the information that `regexp-split' returns,
> interleaved with the full results of matching.  Examples:
> 
>   -> (regexp-explode #rx"[^0-9]([^0-9])?" "0+1.*2")
>   '("0" ("+" #f) "1" (".*" "*") "2")
>   -> (regexp-explode #rx"[^0-9]([^0-9])?" "0+1.*2"
>                      #:select-match car #:select-gap #f)
>   '("+" ".*")
>   -> (regexp-explode #rx"[^0-9]([^0-9])?" "0+1.*2"
>                      #:select-match cadr)
>   '("0" #f "1" "*" "2")
> 
> *** Minor poll: I'm not too happy with that `select-gap' name.  Any
>     suggestions for a better name?
> 
> But the obvious next function to implement,
> `regexp-explode-positions', complicated things a little.  The thing is
> that there's no point in having it have the same interface -- the gaps
> are useless there since they're easily inferred from the matches (as
> seen by the lack of a `regexp-split-positions' function).  So, a
> possible alternative that I thought about is to add a `#:select-match'
> keyword to `regexp-match-positions*' instead, so it can return the
> list of position matches in a similar way.  However, that would lead
> to another problem: it would be bad to have a keyword argument only
> for `regexp-match-positions*' which is not accepted by
> `regexp-match*'.  So a solution to that is to add it to
> `regexp-match*' too, but then there's little point in
> `regexp-explode'...
> 
> So the options that I see are:
> 
> 1. Drop the new `regexp-explode' name, and instead have this
>    functionality folded into `regexp-match*', which will get the two
>    new keywords with a default of #f for `#:select-gap', and `car' for
>    `#:select-match'.  Similarly Add `#:select-match' to
>    `regexp-match-positions*', but not `#:selet-gap'.
> 
> 1a. Minor variation: insist on uniformity, and include a
>     `#:select-gap' keyword for `regexp-match-positions*' too.
> 
> 2. Same as #1, but also have `regexp-explode', which is now the same
>    as `regexp-match*' but with different defaults for the two
>    keywords.
> 
> 2a. Same variation for #1a.
> 
> 3. Do not extend the interface of existing functions -- have only the
>    new `regexp-explode' have the added functionality.  For the
>    positions version, add a `regexp-explode-positions', without a
>    `#:select-gap' keyword.  The possible advantage here is that the
>    (already complicated) output type of `regexp-match*' stays the
>    same, and `regexp-explode' gets the much more complicated one.
> 
> 3a. Same as #3, but with `#:select-gap' for
>     `regexp-explode-positions'.
> 
> I'm now leaning towards #1.  Any votes for other options, or maybe
> something different?

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.