Ha. But I should have tested the first function, right? Yes, that was also broken.<div><br></div><div><div>(define (string-trim s)</div><div> (regexp-replace #px"^\\s*(.*?)\\s*$" s "\\1"))</div><div><br> </div><div>... passes the test cases and is a lot faster than the broken version; it's now a little less than a 2x difference:</div><div><div><br></div><div>> (test)</div><div>cpu time: 426 real time: 437 gc time: 22</div> <div>cpu time: 231 real time: 230 gc time: 0</div><div>> (test)</div><div>cpu time: 422 real time: 431 gc time: 21</div><div>cpu time: 231 real time: 231 gc time: 0</div><div>> (test)</div><div>cpu time: 450 real time: 456 gc time: 21</div> <div>cpu time: 237 real time: 261 gc time: 0</div></div><div><br></div><div><br></div><div><br></div><div><br></div><br><div class="gmail_quote">On Sat, Apr 2, 2011 at 6:23 PM, Jon Zeppieri <span dir="ltr"><<a href="mailto:zeppieri@gmail.com">zeppieri@gmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br><br><div class="gmail_quote"><div class="im">On Sat, Apr 2, 2011 at 6:06 PM, Robby Findler <span dir="ltr"><<a href="mailto:robby@eecs.northwestern.edu" target="_blank">robby@eecs.northwestern.edu</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> (oh, and I meant to add the usual "where are you test cases?!?!"<br> comment here, but forgot.)<br> <font color="#888888"><br> Robby<br></font></blockquote><div><br></div><div><br></div></div><div>Ugh, you're right. </div><div><br></div><div>My understanding is that the function is supposed to return a string equal to the input string with leading and trailing whitespace removed. But I wasn't the one who originally started the discussion; that was Richard Hixson. I just got curious, because he wanted to avoid using regexps for performance reasons (I think), and that made we wonder what the how large the difference was. </div> <div><br></div><div>But back to the function... Yes, that's broken. (Also, it turns out that replacing #px with #rx may make the former function a lot faster, but it doesn't actually work, at all.)</div><div><br> </div><div>In the second function, the end parameter to the second use of 'scan' is wrong, of course, since first non-whitespace character in the string may also be the last. So, changing the second string-trim function to:</div> <div><br></div><div><div class="im"><div>(define (string-trim s)</div><div> (define-syntax scan</div><div> (syntax-rules ()</div><div> ((_ s start end step)</div><div> (for/first ((i (in-range start end step)) </div> <div> #:when (not (char-whitespace? (string-ref s i))))</div> <div> i))))</div><div> </div><div> (let* ((len (string-length s))</div><div> (last-index (sub1 len))</div><div> (start (or (scan s 0 len 1) 0))</div></div><div> (end (or (scan s last-index (sub1 start) -1) last-index)))</div> <div class="im"> <div> (substring s start (add1 end))))</div></div></div><div><br></div><div>... works on the the following test cases:</div><div><br></div><div><div>> (string-trim "")</div><div>""</div><div>> (string-trim "a")</div> <div>"a"</div><div>> (string-trim "ab")</div><div>"ab"</div><div>> (string-trim " ab")</div><div>"ab"</div><div>> (string-trim " ab")</div><div>"ab"</div> <div>> (string-trim " ab ")</div><div>"ab"</div><div>> (string-trim "ab ")</div><div>"ab"</div><div><div>> (string-trim " s sdf d ")</div><div>"s sdf d"</div> </div><div><br></div></div><div>... and the times aren't much altered.</div><div><br></div><font color="#888888"><div>-Jon</div></font><div><div></div><div class="h5"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <font color="#888888"> </font><div><div></div><div><br> On Sat, Apr 2, 2011 at 5:06 PM, Robby Findler<br> <<a href="mailto:robby@eecs.northwestern.edu" target="_blank">robby@eecs.northwestern.edu</a>> wrote:<br> > I've lost track of what the function is supposed to be doing, but your<br> > two functions don't agree on the input "a ", I don't think. I get<br> > this:<br> ><br> > (define (string-trim.1 s)<br> > (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))<br> ><br> > (define (string-trim.2 s)<br> > (define-syntax scan<br> > (syntax-rules ()<br> > ((_ s start end step)<br> > (for/first ((i (in-range start end step))<br> > #:when (not (char-whitespace? (string-ref s i))))<br> > i))))<br> ><br> > (let* ((len (string-length s))<br> > (last-index (sub1 len))<br> > (start (or (scan s 0 len 1) 0))<br> > (end (or (scan s last-index start -1) last-index)))<br> > (substring s start (add1 end))))<br> ><br> >> (string-trim.2 "a ")<br> > "a "<br> >> (string-trim.1 "a ")<br> > "a"<br> ><br> ><br> > On Sat, Apr 2, 2011 at 5:03 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com" target="_blank">zeppieri@gmail.com</a>> wrote:<br> >> Actually #rx seems to be much faster than #px (in this case, at any rate),<br> >> but it's still slower:<br> >>> (test)<br> >> cpu time: 1162 real time: 1181 gc time: 40<br> >> cpu time: 230 real time: 230 gc time: 0<br> >>> (test)<br> >> cpu time: 1184 real time: 1198 gc time: 38<br> >> cpu time: 258 real time: 259 gc time: 21<br> >>> (test)<br> >> cpu time: 1220 real time: 1544 gc time: 40<br> >> cpu time: 233 real time: 233 gc time: 0<br> >><br> >> On Sat, Apr 2, 2011 at 5:56 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com" target="_blank">zeppieri@gmail.com</a>> wrote:<br> >>><br> >>> I was a bit surprised to find that the scanning-by-hand approach really is<br> >>> significantly faster than using regexps.<br> >>> Between these two functions:<br> >>> (define (string-trim s)<br> >>> (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))<br> >>> ... and ...<br> >>> (define (string-trim s)<br> >>> (define-syntax scan<br> >>> (syntax-rules ()<br> >>> ((_ s start end step)<br> >>> (for/first ((i (in-range start end step))<br> >>> #:when (not (char-whitespace? (string-ref s i))))<br> >>> i))))<br> >>><br> >>> (let* ((len (string-length s))<br> >>> (last-index (sub1 len))<br> >>> (start (or (scan s 0 len 1) 0))<br> >>> (end (or (scan s last-index start -1) last-index)))<br> >>> (substring s start (add1 end))))<br> >>><br> >>> ... the latter is much faster. On 100000 iterations, using the test<br> >>> string:<br> >>> " \n \t foo bar<br> >>> baz\n \r "<br> >>> as input, I'm getting numbers like these (where the first time is for the<br> >>> regexp function and the second is for the hand-scanning function):<br> >>> > (test)<br> >>> cpu time: 8003 real time: 8008 gc time: 0<br> >>> cpu time: 256 real time: 257 gc time: 22<br> >>> > (test)<br> >>> cpu time: 8028 real time: 8025 gc time: 0<br> >>> cpu time: 255 real time: 255 gc time: 22<br> >>> > (test)<br> >>> cpu time: 8418 real time: 8424 gc time: 0<br> >>> cpu time: 260 real time: 260 gc time: 22<br> >>> > (test)<br> >>> cpu time: 8390 real time: 8401 gc time: 0<br> >>> cpu time: 252 real time: 253 gc time: 20<br> >>><br> >>><br> >>><br> >>> On Sat, Apr 2, 2011 at 5:20 PM, Richard Cleis <<a href="mailto:rcleis@mac.com" target="_blank">rcleis@mac.com</a>> wrote:<br> >>>><br> >>>> You can use an index to the string to find the location of your goal,<br> >>>> then return the substring when you are done.<br> >>>><br> >>>> rac<br> >>>><br> >>>> On Apr 2, 2011, at 3:08 PM, Charles Hixson wrote:<br> >>>><br> >>>> > This seems to be what I want the string-trim to do, but it seems that<br> >>>> > all the string copying would be expensive. Is there a way to improve it by<br> >>>> > avoiding the string copying?<br> >>>> ><br> >>>> > My original inclination was to use a while loop with a test for<br> >>>> > non-whitespace, but that appears to not be something scheme supports.<br> >>>> ><br> >>>> > (define (string-trim s)<br> >>>> > (let ( (l (string-length s) ) )<br> >>>> > (cond<br> >>>> > [ (= l 0) #f]<br> >>>> > [ (char-whitespace? (string-ref s (- l 1) ) ) (string-trim<br> >>>> > (substring s 0 (- l 1) ) ) ]<br> >>>> > [else s]) ) )<br> >>>> > _________________________________________________<br> >>>> > For list-related administrative tasks:<br> >>>> > <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br> >>>><br> >>>> _________________________________________________<br> >>>> For list-related administrative tasks:<br> >>>> <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br> >>><br> >><br> >><br> >> _________________________________________________<br> >> For list-related administrative tasks:<br> >> <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br> >><br> ><br> </div></div></blockquote></div></div></div><br> </blockquote></div><br></div>