[racket] string-trim : an implementation & a question

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Sat Apr 2 18:06:56 EDT 2011

(oh, and I meant to add the usual "where are you test cases?!?!"
comment here, but forgot.)

Robby

On Sat, Apr 2, 2011 at 5:06 PM, Robby Findler
<robby at eecs.northwestern.edu> wrote:
> I've lost track of what the function is supposed to be doing, but your
> two functions don't agree on the input "a ", I don't think. I get
> this:
>
> (define (string-trim.1 s)
>  (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))
>
> (define (string-trim.2 s)
>  (define-syntax scan
>    (syntax-rules ()
>      ((_ s start end step)
>       (for/first ((i (in-range start end step))
>                   #:when (not (char-whitespace? (string-ref s i))))
>         i))))
>
>  (let* ((len (string-length s))
>         (last-index (sub1 len))
>         (start (or (scan s 0 len 1) 0))
>         (end (or (scan s last-index start -1) last-index)))
>    (substring s start (add1 end))))
>
>> (string-trim.2 "a ")
> "a "
>> (string-trim.1 "a ")
> "a"
>
>
> On Sat, Apr 2, 2011 at 5:03 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:
>> Actually #rx seems to be much faster than #px (in this case, at any rate),
>> but it's still slower:
>>> (test)
>> cpu time: 1162 real time: 1181 gc time: 40
>> cpu time: 230 real time: 230 gc time: 0
>>> (test)
>> cpu time: 1184 real time: 1198 gc time: 38
>> cpu time: 258 real time: 259 gc time: 21
>>> (test)
>> cpu time: 1220 real time: 1544 gc time: 40
>> cpu time: 233 real time: 233 gc time: 0
>>
>> On Sat, Apr 2, 2011 at 5:56 PM, Jon Zeppieri <zeppieri at gmail.com> wrote:
>>>
>>> I was a bit surprised to find that the scanning-by-hand approach really is
>>> significantly faster than using regexps.
>>> Between these two functions:
>>> (define (string-trim s)
>>>   (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))
>>> ... and ...
>>> (define (string-trim s)
>>>   (define-syntax scan
>>>     (syntax-rules ()
>>>       ((_ s start end step)
>>>        (for/first ((i (in-range start end step))
>>>                    #:when (not (char-whitespace? (string-ref s i))))
>>>          i))))
>>>
>>>   (let* ((len (string-length s))
>>>          (last-index (sub1 len))
>>>          (start (or (scan s 0 len 1) 0))
>>>          (end (or (scan s last-index start -1) last-index)))
>>>     (substring s start (add1 end))))
>>>
>>> ... the latter is much faster. On 100000 iterations, using the test
>>> string:
>>>  "                                                      \n  \t foo bar
>>> baz\n                                    \r   "
>>> as input, I'm getting numbers like these (where the first time is for the
>>> regexp function and the second is for the hand-scanning function):
>>> > (test)
>>> cpu time: 8003 real time: 8008 gc time: 0
>>> cpu time: 256 real time: 257 gc time: 22
>>> > (test)
>>> cpu time: 8028 real time: 8025 gc time: 0
>>> cpu time: 255 real time: 255 gc time: 22
>>> > (test)
>>> cpu time: 8418 real time: 8424 gc time: 0
>>> cpu time: 260 real time: 260 gc time: 22
>>> > (test)
>>> cpu time: 8390 real time: 8401 gc time: 0
>>> cpu time: 252 real time: 253 gc time: 20
>>>
>>>
>>>
>>> On Sat, Apr 2, 2011 at 5:20 PM, Richard Cleis <rcleis at mac.com> wrote:
>>>>
>>>> You can use an index to the string to find the location of your goal,
>>>> then return the substring when you are done.
>>>>
>>>> rac
>>>>
>>>> On Apr 2, 2011, at 3:08 PM, Charles Hixson wrote:
>>>>
>>>> > This seems to be what I want the string-trim to do, but it seems that
>>>> > all the string copying would be expensive.  Is there a way to improve it by
>>>> > avoiding the string copying?
>>>> >
>>>> > My original inclination was to use a while loop with a test for
>>>> > non-whitespace, but that appears to not be something scheme supports.
>>>> >
>>>> > (define (string-trim s)
>>>> >    (let ( (l (string-length s) ) )
>>>> >      (cond
>>>> >        [ (= l 0) #f]
>>>> >        [ (char-whitespace? (string-ref s (- l 1) ) )    (string-trim
>>>> > (substring s 0 (- l 1) ) ) ]
>>>> >        [else s]) ) )
>>>> > _________________________________________________
>>>> > For list-related administrative tasks:
>>>> > http://lists.racket-lang.org/listinfo/users
>>>>
>>>> _________________________________________________
>>>>  For list-related administrative tasks:
>>>>  http://lists.racket-lang.org/listinfo/users
>>>
>>
>>
>> _________________________________________________
>>  For list-related administrative tasks:
>>  http://lists.racket-lang.org/listinfo/users
>>
>



Posted on the users mailing list.