[racket-dev] Are There More String Functions?

From: Eli Barzilay (eli at barzilay.org)
Date: Wed Apr 18 15:12:25 EDT 2012

Yesterday, Sam Tobin-Hochstadt wrote:
> I think `racket/string' should provide the useful string functions,
> rather than refer users to srfis.  The only srfi/13 function I ever
> use is `string-trim-both' -- any objection to adding that to
> `racket/string'?

+1 for this in general, and since the `trim' function is the one that
usually leads to this question, it makes sense to add it.  I have most
done of that now.  Some observations:

* Looking around, there are two kinds of "customizations" -- the
  characters that are removed, and which side to remove from (usually
  in the form of three functions).  I'm going with a single
  `string-trim' function with an optional regexp for the first and
  `#:left?' and `#:right' keywords for the latter.

* It's possible to go with other ways to specify characters, up to
  srfi-13's use of srfi-14, and this is part of why I didn't add a
  trim function yet.  I now think that it's best to have something
  that is usually useful (which is by far "just whitespaces") and be
  done with it.  If you need one of these sophisticated things, the
  `regexp-replace' way is still easy enough.

* Another point is the best way to run it efficiently.  There was a
  largish discussion a while ago about various ways to do that (and I
  happened to have gone through a bunch of options shortly before that
  too).  See also
  for an overview of options in JS.

  So in the same spirit as above, I'm just doing something that works
  reasonably well.  (Again, assuming that if speed is really
  important, then you probably have a good idea on the strings that
  you're trimming and you can just do whatever works for you

* Finally, I'm also adding a related function:
  `string-normalize-spaces', which takes a string and a regexp for the
  spaces, and turns all spaces into single ones.  Same principles as
  above.  This one is getting a `#:trim?' keyword that says whether
  spaces at the edges should be dropped (the default) or normalized.

  BTW, I hate that name -- it makes the `string-' prefix looks even
  uglier...  Any suggestions for a better name?

          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the dev mailing list.