I was a bit surprised to find that the scanning-by-hand approach really is significantly faster than using regexps.<div><br></div><div>Between these two functions:</div><div><br></div><div><div>(define (string-trim s)</div>
<div> (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))</div><div><br></div><div>... and ...</div><div><br></div><div><div>(define (string-trim s)</div><div> (define-syntax scan</div><div> (syntax-rules ()</div>
<div> ((_ s start end step)</div><div> (for/first ((i (in-range start end step)) </div><div> #:when (not (char-whitespace? (string-ref s i))))</div><div> i))))</div><div> </div><div> (let* ((len (string-length s))</div>
<div> (last-index (sub1 len))</div><div> (start (or (scan s 0 len 1) 0))</div><div> (end (or (scan s last-index start -1) last-index)))</div><div> (substring s start (add1 end))))</div></div><div>
<br></div><div><br></div><div>... the latter is much faster. On 100000 iterations, using the test string:</div><div> " \n \t foo bar baz\n \r "</div>
<div>as input, I'm getting numbers like these (where the first time is for the regexp function and the second is for the hand-scanning function):</div><div><br></div><div><div>> (test)</div><div>cpu time: 8003 real time: 8008 gc time: 0</div>
<div>cpu time: 256 real time: 257 gc time: 22</div><div>> (test)</div><div>cpu time: 8028 real time: 8025 gc time: 0</div><div>cpu time: 255 real time: 255 gc time: 22</div><div>> (test)</div><div>cpu time: 8418 real time: 8424 gc time: 0</div>
<div>cpu time: 260 real time: 260 gc time: 22</div><div>> (test)</div><div>cpu time: 8390 real time: 8401 gc time: 0</div><div>cpu time: 252 real time: 253 gc time: 20</div></div><div><br></div><div><br></div><div><br>
</div><br><div class="gmail_quote">On Sat, Apr 2, 2011 at 5:20 PM, Richard Cleis <span dir="ltr"><<a href="mailto:rcleis@mac.com">rcleis@mac.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
You can use an index to the string to find the location of your goal, then return the substring when you are done.<br>
<br>
rac<br>
<div><div></div><div class="h5"><br>
On Apr 2, 2011, at 3:08 PM, Charles Hixson wrote:<br>
<br>
> This seems to be what I want the string-trim to do, but it seems that all the string copying would be expensive. Is there a way to improve it by avoiding the string copying?<br>
><br>
> My original inclination was to use a while loop with a test for non-whitespace, but that appears to not be something scheme supports.<br>
><br>
> (define (string-trim s)<br>
> (let ( (l (string-length s) ) )<br>
> (cond<br>
> [ (= l 0) #f]<br>
> [ (char-whitespace? (string-ref s (- l 1) ) ) (string-trim (substring s 0 (- l 1) ) ) ]<br>
> [else s]) ) )<br>
> _________________________________________________<br>
> For list-related administrative tasks:<br>
> <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br>
<br>
_________________________________________________<br>
For list-related administrative tasks:<br>
<a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br>
</div></div></blockquote></div><br></div>