[racket-dev] `racket/string' extensions

From: Eli Barzilay (eli at barzilay.org)
Date: Thu May 24 16:45:26 EDT 2012

About a month ago, Eli Barzilay wrote:
> [...]

This is now almost completely implemented.  If you have any comments
on the below, now would be a good time.

The two things that I didn't do yet are the following (I'm still not
sure about the names and the functionality):

>   (string-index str sub [start 0] [end (string-length str)])
>     Looks for occurrences of `sub' in `str', returns the index if
>     found, #f otherwise.  [*2*] I'm not sure about the name, maybe
>     `string-index-of' is better?
> 
>   (list-index list elt)
>     Looks for `elt' in `list'.  This is a possible extension for
>     `racket/list' that would be kind of obvious with adding the above.
>     [*3*] I'm not sure if it should be added, but IIRC it was
>     requested a few times.  If it does get added, then there's another
>     question for how far the analogy goes: [*3a*] Should it take a
>     start/end index too?  [*3b*] Should it take a list of elements and
>     look for a matching sublist instead (which is not a function that
>     is common to ask for, AFAICT)?

I might start a separate thread on suggestions for this and more
needed functions in `racket/list'.


To summarize the new things:

* (string-join strs [sep " "])

  The new thing here is that the `sep' argument now defaults to a
  space.  This is something that is often done in such functions
  elsewhere (including in srfi-1, IIRC), and with the below functions
  working with spaces by default it seems like the right thing.

* (string-trim str [sep #px"\\s+"]
                   #:left? [l? #t] #:right? [r? #t] #:repeat? [+? #f])

  Trims spaces at the edges of the string.  Two notes:

  1. The default for `#:repeat?' is just #f -- an option that I
     suggested at some point would be to have it be true if `sep' is
     given as a string:

       (string-trim str [sep #px"\\s+"] ...
                        #:repeat? [+? (string? str)])

     The problem with that is that it makes it less uniform, and I can
     see cases where such a behavior can be undesired.  (For non
     whitespace separators and separators that are longer than one
     character.)

  2. Another subtle point is what should these return:

       (string-trim "aaa" "aa")
       (string-trim "ababa" "aba")

     After deliberating on that, I eventually made it return ""
     because it seems like that would be more expected.  (But that's
     really a subtle point, since multi-character string separators
     are very rare anyway.)  I also looked into several of these
     functions, but there's no precedent that I've seen either way.
     (In many cases it uses a regexp or uses the `sep' string as a bag
     of characters.)

     As a corollary of this, I thought that it might also mean that
     this should happen:

       (string-split "x---y---z" "--") => '("x" "y" "z")

     but in this case it looks like this wouldn't be expected.
     Perhaps a hand-wavy proof of this is that coding this behavior
     would take a little effort (need to look for all occurrences of
     the pattern) whereas in the `string-trim' case the above is very
     easy (find the start and end, return "" if start >= end).

* (string-split str [sep #px"\\s+"] #:trim? [trim? #t] #:repeat? [+? #f])

  As discussed.

* (string-normalize-spaces str [sep #px"\\s+"] [space " "]
                           #:trim? [trim? #t] #:repeat? [+? #f])

  I ended up keeping the name of this.  Also, it's easy to implement
  directly as

    (string-join (string-split str sep ...) space)

* (string-replace str from to #:all? [all? #t])

  As discussed -- note the different argument order (like I said, the
  focus of these things is the string).  I initially had two functions,
  `string-replace' and `string-replace*' but eventually went with a
  single function + keyword.  One reason for that is that the
  simplified interface has in several cases the behavior of operating
  over the whole string, so it seems like a better fit for a default,
  so I delegated the decision to a keyword.

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.