[racket-dev] `racket/string' extensions
About a month ago, Eli Barzilay wrote:
> [...]
This is now almost completely implemented. If you have any comments
on the below, now would be a good time.
The two things that I didn't do yet are the following (I'm still not
sure about the names and the functionality):
> (string-index str sub [start 0] [end (string-length str)])
> Looks for occurrences of `sub' in `str', returns the index if
> found, #f otherwise. [*2*] I'm not sure about the name, maybe
> `string-index-of' is better?
>
> (list-index list elt)
> Looks for `elt' in `list'. This is a possible extension for
> `racket/list' that would be kind of obvious with adding the above.
> [*3*] I'm not sure if it should be added, but IIRC it was
> requested a few times. If it does get added, then there's another
> question for how far the analogy goes: [*3a*] Should it take a
> start/end index too? [*3b*] Should it take a list of elements and
> look for a matching sublist instead (which is not a function that
> is common to ask for, AFAICT)?
I might start a separate thread on suggestions for this and more
needed functions in `racket/list'.
To summarize the new things:
* (string-join strs [sep " "])
The new thing here is that the `sep' argument now defaults to a
space. This is something that is often done in such functions
elsewhere (including in srfi-1, IIRC), and with the below functions
working with spaces by default it seems like the right thing.
* (string-trim str [sep #px"\\s+"]
#:left? [l? #t] #:right? [r? #t] #:repeat? [+? #f])
Trims spaces at the edges of the string. Two notes:
1. The default for `#:repeat?' is just #f -- an option that I
suggested at some point would be to have it be true if `sep' is
given as a string:
(string-trim str [sep #px"\\s+"] ...
#:repeat? [+? (string? str)])
The problem with that is that it makes it less uniform, and I can
see cases where such a behavior can be undesired. (For non
whitespace separators and separators that are longer than one
character.)
2. Another subtle point is what should these return:
(string-trim "aaa" "aa")
(string-trim "ababa" "aba")
After deliberating on that, I eventually made it return ""
because it seems like that would be more expected. (But that's
really a subtle point, since multi-character string separators
are very rare anyway.) I also looked into several of these
functions, but there's no precedent that I've seen either way.
(In many cases it uses a regexp or uses the `sep' string as a bag
of characters.)
As a corollary of this, I thought that it might also mean that
this should happen:
(string-split "x---y---z" "--") => '("x" "y" "z")
but in this case it looks like this wouldn't be expected.
Perhaps a hand-wavy proof of this is that coding this behavior
would take a little effort (need to look for all occurrences of
the pattern) whereas in the `string-trim' case the above is very
easy (find the start and end, return "" if start >= end).
* (string-split str [sep #px"\\s+"] #:trim? [trim? #t] #:repeat? [+? #f])
As discussed.
* (string-normalize-spaces str [sep #px"\\s+"] [space " "]
#:trim? [trim? #t] #:repeat? [+? #f])
I ended up keeping the name of this. Also, it's easy to implement
directly as
(string-join (string-split str sep ...) space)
* (string-replace str from to #:all? [all? #t])
As discussed -- note the different argument order (like I said, the
focus of these things is the string). I initially had two functions,
`string-replace' and `string-replace*' but eventually went with a
single function + keyword. One reason for that is that the
simplified interface has in several cases the behavior of operating
over the whole string, so it seems like a better fit for a default,
so I delegated the decision to a keyword.
--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!