[racket-dev] `string-split'

From: Michael W (mwilber at uccs.edu)
Date: Thu Apr 19 10:03:57 EDT 2012

(TL;DR: I'd suggest two functions: one (string-words str)
function that does Eli's way, and one (string-split str sep) that
does it Laurent's way).

50 minutes ago, Eli Barzilay wrote:
> That doesn't seem right -- with this you get
>   -> (string-split " st  ring")
>   '("" "st" "" "ring")
> which is why I think that the above is a better definition in terms of
> newbie-ness.

No, every other language I've worked with does that.

$ python
Python 3.2.2 (default, Nov 21 2011, 16:51:01) 
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more
>>> " st  ring".split(" ")
['', 'st', '', 'ring']

$ node
> " st  ring".split(" ")
[ '', 'st', '', 'ring' ]

$ php -a
php > var_dump(split(" ", " str  ing"));
array(4) {
  string(0) ""
  string(3) "str"
  string(0) ""
  string(3) "ing"

Haskell uses two functions; one which eliminates contiguous runs
and one which doesn't (and comes from an entire external library,
sheesh! though it's easy to write your own):
$ ghci
Prelude> words " str  ing"
Prelude> Data.List.Split.splitOn " "  " str  ing"

Ruby has the weirdest behavior, which I consider to be a bug:

$ irb
irb(main):001:0> " st  ring".split(" ")
=> ["st", "ring"]
irb(main):002:0> " st  ring".split(/ /)
=> ["", "st", "", "ring"]

The ruby docs say:
    If pattern is a String, then its contents are used as the
    delimiter when splitting str. If pattern is a single space, str
    is split on whitespace, with leading whitespace and runs of
    contiguous whitespace characters ignored.
    If pattern is a Regexp, str is divided where the pattern matches.
    Whenever the pattern matches a zero-length string, str is split
    into individual characters. If pattern contains groups, the
    respective matches will be returned in the array as well.

In looking for Lua (which doesn't include one, by the way), I
found http://lua-users.org/wiki/SplitJoin which has a big summary
of the issues.

For the Future!

Posted on the dev mailing list.