[racket] string-trim : an implementation & a question
How about this string?
" "
Robby
On Sat, Apr 2, 2011 at 5:38 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:
> Ha. But I should have tested the first function, right? Yes, that was also
> broken.
> (define (string-trim s)
> (regexp-replace #px"^\\s*(.*?)\\s*$" s "\\1"))
> ... passes the test cases and is a lot faster than the broken version; it's
> now a little less than a 2x difference:
>> (test)
> cpu time: 426 real time: 437 gc time: 22
> cpu time: 231 real time: 230 gc time: 0
>> (test)
> cpu time: 422 real time: 431 gc time: 21
> cpu time: 231 real time: 231 gc time: 0
>> (test)
> cpu time: 450 real time: 456 gc time: 21
> cpu time: 237 real time: 261 gc time: 0
>
>
>
>
> On Sat, Apr 2, 2011 at 6:23 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:
>>
>>
>> On Sat, Apr 2, 2011 at 6:06 PM, Robby Findler
>> <robby at eecs.northwestern.edu> wrote:
>>>
>>> (oh, and I meant to add the usual "where are you test cases?!?!"
>>> comment here, but forgot.)
>>>
>>> Robby
>>
>>
>> Ugh, you're right.
>> My understanding is that the function is supposed to return a string equal
>> to the input string with leading and trailing whitespace removed. But I
>> wasn't the one who originally started the discussion; that was Richard
>> Hixson. I just got curious, because he wanted to avoid using regexps for
>> performance reasons (I think), and that made we wonder what the how large
>> the difference was.
>> But back to the function... Yes, that's broken. (Also, it turns out that
>> replacing #px with #rx may make the former function a lot faster, but it
>> doesn't actually work, at all.)
>> In the second function, the end parameter to the second use of 'scan' is
>> wrong, of course, since first non-whitespace character in the string may
>> also be the last. So, changing the second string-trim function to:
>> (define (string-trim s)
>> (define-syntax scan
>> (syntax-rules ()
>> ((_ s start end step)
>> (for/first ((i (in-range start end step))
>> #:when (not (char-whitespace? (string-ref s i))))
>> i))))
>>
>> (let* ((len (string-length s))
>> (last-index (sub1 len))
>> (start (or (scan s 0 len 1) 0))
>> (end (or (scan s last-index (sub1 start) -1) last-index)))
>> (substring s start (add1 end))))
>> ... works on the the following test cases:
>> > (string-trim "")
>> ""
>> > (string-trim "a")
>> "a"
>> > (string-trim "ab")
>> "ab"
>> > (string-trim " ab")
>> "ab"
>> > (string-trim " ab")
>> "ab"
>> > (string-trim " ab ")
>> "ab"
>> > (string-trim "ab ")
>> "ab"
>> > (string-trim " s sdf d ")
>> "s sdf d"
>> ... and the times aren't much altered.
>> -Jon
>>
>>>
>>> On Sat, Apr 2, 2011 at 5:06 PM, Robby Findler
>>> <robby at eecs.northwestern.edu> wrote:
>>> > I've lost track of what the function is supposed to be doing, but your
>>> > two functions don't agree on the input "a ", I don't think. I get
>>> > this:
>>> >
>>> > (define (string-trim.1 s)
>>> > (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))
>>> >
>>> > (define (string-trim.2 s)
>>> > (define-syntax scan
>>> > (syntax-rules ()
>>> > ((_ s start end step)
>>> > (for/first ((i (in-range start end step))
>>> > #:when (not (char-whitespace? (string-ref s i))))
>>> > i))))
>>> >
>>> > (let* ((len (string-length s))
>>> > (last-index (sub1 len))
>>> > (start (or (scan s 0 len 1) 0))
>>> > (end (or (scan s last-index start -1) last-index)))
>>> > (substring s start (add1 end))))
>>> >
>>> >> (string-trim.2 "a ")
>>> > "a "
>>> >> (string-trim.1 "a ")
>>> > "a"
>>> >
>>> >
>>> > On Sat, Apr 2, 2011 at 5:03 PM, Jon Zeppieri <zeppieri at gmail.com>
>>> > wrote:
>>> >> Actually #rx seems to be much faster than #px (in this case, at any
>>> >> rate),
>>> >> but it's still slower:
>>> >>> (test)
>>> >> cpu time: 1162 real time: 1181 gc time: 40
>>> >> cpu time: 230 real time: 230 gc time: 0
>>> >>> (test)
>>> >> cpu time: 1184 real time: 1198 gc time: 38
>>> >> cpu time: 258 real time: 259 gc time: 21
>>> >>> (test)
>>> >> cpu time: 1220 real time: 1544 gc time: 40
>>> >> cpu time: 233 real time: 233 gc time: 0
>>> >>
>>> >> On Sat, Apr 2, 2011 at 5:56 PM, Jon Zeppieri <zeppieri at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> I was a bit surprised to find that the scanning-by-hand approach
>>> >>> really is
>>> >>> significantly faster than using regexps.
>>> >>> Between these two functions:
>>> >>> (define (string-trim s)
>>> >>> (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))
>>> >>> ... and ...
>>> >>> (define (string-trim s)
>>> >>> (define-syntax scan
>>> >>> (syntax-rules ()
>>> >>> ((_ s start end step)
>>> >>> (for/first ((i (in-range start end step))
>>> >>> #:when (not (char-whitespace? (string-ref s i))))
>>> >>> i))))
>>> >>>
>>> >>> (let* ((len (string-length s))
>>> >>> (last-index (sub1 len))
>>> >>> (start (or (scan s 0 len 1) 0))
>>> >>> (end (or (scan s last-index start -1) last-index)))
>>> >>> (substring s start (add1 end))))
>>> >>>
>>> >>> ... the latter is much faster. On 100000 iterations, using the test
>>> >>> string:
>>> >>> " \n \t foo
>>> >>> bar
>>> >>> baz\n \r "
>>> >>> as input, I'm getting numbers like these (where the first time is for
>>> >>> the
>>> >>> regexp function and the second is for the hand-scanning function):
>>> >>> > (test)
>>> >>> cpu time: 8003 real time: 8008 gc time: 0
>>> >>> cpu time: 256 real time: 257 gc time: 22
>>> >>> > (test)
>>> >>> cpu time: 8028 real time: 8025 gc time: 0
>>> >>> cpu time: 255 real time: 255 gc time: 22
>>> >>> > (test)
>>> >>> cpu time: 8418 real time: 8424 gc time: 0
>>> >>> cpu time: 260 real time: 260 gc time: 22
>>> >>> > (test)
>>> >>> cpu time: 8390 real time: 8401 gc time: 0
>>> >>> cpu time: 252 real time: 253 gc time: 20
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Sat, Apr 2, 2011 at 5:20 PM, Richard Cleis <rcleis at mac.com> wrote:
>>> >>>>
>>> >>>> You can use an index to the string to find the location of your
>>> >>>> goal,
>>> >>>> then return the substring when you are done.
>>> >>>>
>>> >>>> rac
>>> >>>>
>>> >>>> On Apr 2, 2011, at 3:08 PM, Charles Hixson wrote:
>>> >>>>
>>> >>>> > This seems to be what I want the string-trim to do, but it seems
>>> >>>> > that
>>> >>>> > all the string copying would be expensive. Is there a way to
>>> >>>> > improve it by
>>> >>>> > avoiding the string copying?
>>> >>>> >
>>> >>>> > My original inclination was to use a while loop with a test for
>>> >>>> > non-whitespace, but that appears to not be something scheme
>>> >>>> > supports.
>>> >>>> >
>>> >>>> > (define (string-trim s)
>>> >>>> > (let ( (l (string-length s) ) )
>>> >>>> > (cond
>>> >>>> > [ (= l 0) #f]
>>> >>>> > [ (char-whitespace? (string-ref s (- l 1) ) )
>>> >>>> > (string-trim
>>> >>>> > (substring s 0 (- l 1) ) ) ]
>>> >>>> > [else s]) ) )
>>> >>>> > _________________________________________________
>>> >>>> > For list-related administrative tasks:
>>> >>>> > http://lists.racket-lang.org/listinfo/users
>>> >>>>
>>> >>>> _________________________________________________
>>> >>>> For list-related administrative tasks:
>>> >>>> http://lists.racket-lang.org/listinfo/users
>>> >>>
>>> >>
>>> >>
>>> >> _________________________________________________
>>> >> For list-related administrative tasks:
>>> >> http://lists.racket-lang.org/listinfo/users
>>> >>
>>> >
>>
>
>