[racket-dev] `string-split'

From: Eli Barzilay (eli at barzilay.org)
Date: Thu Apr 19 09:00:07 EDT 2012

[Meta-note: I'm not just flatly object to these, just trying to
clarify the exact behavior and the possible effects on other
functions.]

10 minutes ago, Laurent wrote:
>  
> 
>      (define (string-split str [sep #px"\\s+"])
>        (remove* '("") (regexp-split sep str)))
> 
> Nearly, I meant something more like this:
> 
> (define (string-split str [splitter " "])
>   (regexp-split (regexp-quote splitter) str))
> 
> No regexp from the user POV, and much easier to use with little
> knowledge.

That doesn't seem right -- with this you get

  -> (string-split " st  ring")
  '("" "st" "" "ring")

which is why I think that the above is a better definition in terms of
newbie-ness.


10 minutes ago, Matthew Flatt wrote:
> I agree with this: we should add `string-split', the one-argument case
> should be as Eli wrote, and the two-argument case should be as Laurent
> wrote. (Probably the optional second argument should be string-or-#f,
> where #f means to use #px"\\s+".)

Continuing with this line, it seems that a better definition is as
follows:

  (define (string-split str [sep " "])
    (remove* '("") (regexp-split (regexp-quote (or sep " ")) str)))

Except that the full definition could be a bit more efficient.

Three questions:

1. Laurent: Does this make more sense?

2. Matthew: Is there any reason to make the #f-as-default part of the
   interface?  (Even with the new reply I don't see a necessity for
   this -- if the target is newbies, then I think that keeping it as a
   string is simpler...)

3. There's also the point of how this optional argument plays with
   other functions in `racket/string'.  If it works as above, then
   `string-trim' and `string-normalize-spaces' should change
   accordingly so they take the same kind of input simplified
   "regexp".

4. Related to Q3: what does "xy" as that argument mean exactly?
   a. #rx"[xy]"
   b. #rx"[xy]+"
   c. #rx"xy"
   d. #rx"(?:xy)+"

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!


Posted on the dev mailing list.