[racket-dev] `racket/string' extensions

From: Laurent (laurent.orseau at gmail.com)
Date: Fri May 25 04:22:10 EDT 2012

On Thu, May 24, 2012 at 10:45 PM, Eli Barzilay <eli at barzilay.org> wrote:

>
> >   (string-index str sub [start 0] [end (string-length str)])
> >     Looks for occurrences of `sub' in `str', returns the index if
> >     found, #f otherwise.  [*2*] I'm not sure about the name, maybe
> >     `string-index-of' is better?
>

Maybe `string-find'?


>  >   (list-index list elt)
> >     Looks for `elt' in `list'.  This is a possible extension for
> >     `racket/list' that would be kind of obvious with adding the above.
> >     [*3*] I'm not sure if it should be added, but IIRC it was
> >     requested a few times.  If it does get added, then there's another
> >     question for how far the analogy goes: [*3a*] Should it take a
> >     start/end index too?  [*3b*] Should it take a list of elements and
> >     look for a matching sublist instead (which is not a function that
> >     is common to ask for, AFAICT)?
>

3b: I think that's over-complicating things, personally.


>    2. Another subtle point is what should these return:
>
>       (string-trim "aaa" "aa")
>       (string-trim "ababa" "aba")



>     After deliberating on that, I eventually made it return ""
>     because it seems like that would be more expected.


That this could return "" surprised me. The intuitive behavior to me is the
following: If you remove "aa" from "aaa", you get "a", and if you repeat it
(from either side), you don't match and return "a".
This relies on an occidental left->right preference though.

But as you say, these are rare cases, so it could well be left as
"unspecified" or specified in the simplest way to implement, as long as it
is well documented?



> (But that's
>     really a subtle point, since multi-character string separators
>     are very rare anyway.)  I also looked into several of these
>     functions, but there's no precedent that I've seen either way.
>     (In many cases it uses a regexp or uses the `sep' string as a bag
>     of characters.)
>
>     As a corollary of this, I thought that it might also mean that
>     this should happen:
>
>       (string-split "x---y---z" "--") => '("x" "y" "z")
>
>     but in this case it looks like this wouldn't be expected.
>

Indeed.


>     Perhaps a hand-wavy proof of this is that coding this behavior
>     would take a little effort (need to look for all occurrences of
>     the pattern) whereas in the `string-trim' case the above is very
>     easy (find the start and end, return "" if start >= end).


> * (string-split str [sep #px"\\s+"] #:trim? [trim? #t] #:repeat? [+? #f])
>
>  As discussed.
>
> * (string-normalize-spaces str [sep #px"\\s+"] [space " "]
>                           #:trim? [trim? #t] #:repeat? [+? #f])
>

Side comment: If it's possible, formatting the signature like this in the
docs could be a good idea: put all the keyword arguments on another line,
so that the most important and often used options are put forward.  I think
that might help newbies to understand the question (keywords can be a bit
frightening maybe? (not sure)).


>   I ended up keeping the name of this.  Also, it's easy to implement
>  directly as
>
>    (string-join (string-split str sep ...) space)
>

That's a good sign that the functions have the correct (default) behavior.

Looking forward to use all these!

Laurent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/dev/archive/attachments/20120525/d9147576/attachment.html>

Posted on the dev mailing list.