[racket] string-trim : an implementation & a question

From: Jon Zeppieri (zeppieri at gmail.com)
Date: Sat Apr 2 17:56:17 EDT 2011

I was a bit surprised to find that the scanning-by-hand approach really is
significantly faster than using regexps.

Between these two functions:

(define (string-trim s)
  (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))

... and ...

(define (string-trim s)
  (define-syntax scan
    (syntax-rules ()
      ((_ s start end step)
       (for/first ((i (in-range start end step))
                   #:when (not (char-whitespace? (string-ref s i))))
         i))))

  (let* ((len (string-length s))
         (last-index (sub1 len))
         (start (or (scan s 0 len 1) 0))
         (end (or (scan s last-index start -1) last-index)))
    (substring s start (add1 end))))


... the latter is much faster. On 100000 iterations, using the test string:
 "                                                      \n  \t foo bar baz\n
                                   \r   "
as input, I'm getting numbers like these (where the first time is for the
regexp function and the second is for the hand-scanning function):

> (test)
cpu time: 8003 real time: 8008 gc time: 0
cpu time: 256 real time: 257 gc time: 22
> (test)
cpu time: 8028 real time: 8025 gc time: 0
cpu time: 255 real time: 255 gc time: 22
> (test)
cpu time: 8418 real time: 8424 gc time: 0
cpu time: 260 real time: 260 gc time: 22
> (test)
cpu time: 8390 real time: 8401 gc time: 0
cpu time: 252 real time: 253 gc time: 20




On Sat, Apr 2, 2011 at 5:20 PM, Richard Cleis <rcleis at mac.com> wrote:

> You can use an index to the string to find the location of your goal, then
> return the substring when you are done.
>
> rac
>
> On Apr 2, 2011, at 3:08 PM, Charles Hixson wrote:
>
> > This seems to be what I want the string-trim to do, but it seems that all
> the string copying would be expensive.  Is there a way to improve it by
> avoiding the string copying?
> >
> > My original inclination was to use a while loop with a test for
> non-whitespace, but that appears to not be something scheme supports.
> >
> > (define (string-trim s)
> >    (let ( (l (string-length s) ) )
> >      (cond
> >        [ (= l 0) #f]
> >        [ (char-whitespace? (string-ref s (- l 1) ) )    (string-trim
> (substring s 0 (- l 1) ) ) ]
> >        [else s]) ) )
> > _________________________________________________
> > For list-related administrative tasks:
> > http://lists.racket-lang.org/listinfo/users
>
> _________________________________________________
>  For list-related administrative tasks:
>  http://lists.racket-lang.org/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20110402/90337145/attachment.html>

Posted on the users mailing list.