Ha. But I should have tested the first function, right? Yes, that was also broken.<div><br></div><div><div>(define (string-trim s)</div><div> (regexp-replace #px"^\\s*(.*?)\\s*$" s "\\1"))</div><div><br>
</div><div>... passes the test cases and is a lot faster than the broken version; it's now a little less than a 2x difference:</div><div><div><br></div><div>> (test)</div><div>cpu time: 426 real time: 437 gc time: 22</div>
<div>cpu time: 231 real time: 230 gc time: 0</div><div>> (test)</div><div>cpu time: 422 real time: 431 gc time: 21</div><div>cpu time: 231 real time: 231 gc time: 0</div><div>> (test)</div><div>cpu time: 450 real time: 456 gc time: 21</div>
<div>cpu time: 237 real time: 261 gc time: 0</div></div><div><br></div><div><br></div><div><br></div><div><br></div><br><div class="gmail_quote">On Sat, Apr 2, 2011 at 6:23 PM, Jon Zeppieri <span dir="ltr"><<a href="mailto:zeppieri@gmail.com">zeppieri@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br><br><div class="gmail_quote"><div class="im">On Sat, Apr 2, 2011 at 6:06 PM, Robby Findler <span dir="ltr"><<a href="mailto:robby@eecs.northwestern.edu" target="_blank">robby@eecs.northwestern.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
(oh, and I meant to add the usual "where are you test cases?!?!"<br>
comment here, but forgot.)<br>
<font color="#888888"><br>
Robby<br></font></blockquote><div><br></div><div><br></div></div><div>Ugh, you're right. </div><div><br></div><div>My understanding is that the function is supposed to return a string equal to the input string with leading and trailing whitespace removed. But I wasn't the one who originally started the discussion; that was Richard Hixson. I just got curious, because he wanted to avoid using regexps for performance reasons (I think), and that made we wonder what the how large the difference was. </div>
<div><br></div><div>But back to the function... Yes, that's broken. (Also, it turns out that replacing #px with #rx may make the former function a lot faster, but it doesn't actually work, at all.)</div><div><br>
</div><div>In the second function, the end parameter to the second use of 'scan' is wrong, of course, since first non-whitespace character in the string may also be the last. So, changing the second string-trim function to:</div>
<div><br></div><div><div class="im"><div>(define (string-trim s)</div><div> (define-syntax scan</div><div> (syntax-rules ()</div><div> ((_ s start end step)</div><div> (for/first ((i (in-range start end step)) </div>
<div> #:when (not (char-whitespace? (string-ref s i))))</div>
<div> i))))</div><div> </div><div> (let* ((len (string-length s))</div><div> (last-index (sub1 len))</div><div> (start (or (scan s 0 len 1) 0))</div></div><div> (end (or (scan s last-index (sub1 start) -1) last-index)))</div>
<div class="im">
<div> (substring s start (add1 end))))</div></div></div><div><br></div><div>... works on the the following test cases:</div><div><br></div><div><div>> (string-trim "")</div><div>""</div><div>> (string-trim "a")</div>
<div>"a"</div><div>> (string-trim "ab")</div><div>"ab"</div><div>> (string-trim " ab")</div><div>"ab"</div><div>> (string-trim " ab")</div><div>"ab"</div>
<div>> (string-trim " ab ")</div><div>"ab"</div><div>> (string-trim "ab ")</div><div>"ab"</div><div><div>> (string-trim " s sdf d ")</div><div>"s sdf d"</div>
</div><div><br></div></div><div>... and the times aren't much altered.</div><div><br></div><font color="#888888"><div>-Jon</div></font><div><div></div><div class="h5"><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<font color="#888888">
</font><div><div></div><div><br>
On Sat, Apr 2, 2011 at 5:06 PM, Robby Findler<br>
<<a href="mailto:robby@eecs.northwestern.edu" target="_blank">robby@eecs.northwestern.edu</a>> wrote:<br>
> I've lost track of what the function is supposed to be doing, but your<br>
> two functions don't agree on the input "a ", I don't think. I get<br>
> this:<br>
><br>
> (define (string-trim.1 s)<br>
> (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))<br>
><br>
> (define (string-trim.2 s)<br>
> (define-syntax scan<br>
> (syntax-rules ()<br>
> ((_ s start end step)<br>
> (for/first ((i (in-range start end step))<br>
> #:when (not (char-whitespace? (string-ref s i))))<br>
> i))))<br>
><br>
> (let* ((len (string-length s))<br>
> (last-index (sub1 len))<br>
> (start (or (scan s 0 len 1) 0))<br>
> (end (or (scan s last-index start -1) last-index)))<br>
> (substring s start (add1 end))))<br>
><br>
>> (string-trim.2 "a ")<br>
> "a "<br>
>> (string-trim.1 "a ")<br>
> "a"<br>
><br>
><br>
> On Sat, Apr 2, 2011 at 5:03 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com" target="_blank">zeppieri@gmail.com</a>> wrote:<br>
>> Actually #rx seems to be much faster than #px (in this case, at any rate),<br>
>> but it's still slower:<br>
>>> (test)<br>
>> cpu time: 1162 real time: 1181 gc time: 40<br>
>> cpu time: 230 real time: 230 gc time: 0<br>
>>> (test)<br>
>> cpu time: 1184 real time: 1198 gc time: 38<br>
>> cpu time: 258 real time: 259 gc time: 21<br>
>>> (test)<br>
>> cpu time: 1220 real time: 1544 gc time: 40<br>
>> cpu time: 233 real time: 233 gc time: 0<br>
>><br>
>> On Sat, Apr 2, 2011 at 5:56 PM, Jon Zeppieri <<a href="mailto:zeppieri@gmail.com" target="_blank">zeppieri@gmail.com</a>> wrote:<br>
>>><br>
>>> I was a bit surprised to find that the scanning-by-hand approach really is<br>
>>> significantly faster than using regexps.<br>
>>> Between these two functions:<br>
>>> (define (string-trim s)<br>
>>> (regexp-replace #px"^\\s*([^\\s]*)\\s*$" s "\\1"))<br>
>>> ... and ...<br>
>>> (define (string-trim s)<br>
>>> (define-syntax scan<br>
>>> (syntax-rules ()<br>
>>> ((_ s start end step)<br>
>>> (for/first ((i (in-range start end step))<br>
>>> #:when (not (char-whitespace? (string-ref s i))))<br>
>>> i))))<br>
>>><br>
>>> (let* ((len (string-length s))<br>
>>> (last-index (sub1 len))<br>
>>> (start (or (scan s 0 len 1) 0))<br>
>>> (end (or (scan s last-index start -1) last-index)))<br>
>>> (substring s start (add1 end))))<br>
>>><br>
>>> ... the latter is much faster. On 100000 iterations, using the test<br>
>>> string:<br>
>>> " \n \t foo bar<br>
>>> baz\n \r "<br>
>>> as input, I'm getting numbers like these (where the first time is for the<br>
>>> regexp function and the second is for the hand-scanning function):<br>
>>> > (test)<br>
>>> cpu time: 8003 real time: 8008 gc time: 0<br>
>>> cpu time: 256 real time: 257 gc time: 22<br>
>>> > (test)<br>
>>> cpu time: 8028 real time: 8025 gc time: 0<br>
>>> cpu time: 255 real time: 255 gc time: 22<br>
>>> > (test)<br>
>>> cpu time: 8418 real time: 8424 gc time: 0<br>
>>> cpu time: 260 real time: 260 gc time: 22<br>
>>> > (test)<br>
>>> cpu time: 8390 real time: 8401 gc time: 0<br>
>>> cpu time: 252 real time: 253 gc time: 20<br>
>>><br>
>>><br>
>>><br>
>>> On Sat, Apr 2, 2011 at 5:20 PM, Richard Cleis <<a href="mailto:rcleis@mac.com" target="_blank">rcleis@mac.com</a>> wrote:<br>
>>>><br>
>>>> You can use an index to the string to find the location of your goal,<br>
>>>> then return the substring when you are done.<br>
>>>><br>
>>>> rac<br>
>>>><br>
>>>> On Apr 2, 2011, at 3:08 PM, Charles Hixson wrote:<br>
>>>><br>
>>>> > This seems to be what I want the string-trim to do, but it seems that<br>
>>>> > all the string copying would be expensive. Is there a way to improve it by<br>
>>>> > avoiding the string copying?<br>
>>>> ><br>
>>>> > My original inclination was to use a while loop with a test for<br>
>>>> > non-whitespace, but that appears to not be something scheme supports.<br>
>>>> ><br>
>>>> > (define (string-trim s)<br>
>>>> > (let ( (l (string-length s) ) )<br>
>>>> > (cond<br>
>>>> > [ (= l 0) #f]<br>
>>>> > [ (char-whitespace? (string-ref s (- l 1) ) ) (string-trim<br>
>>>> > (substring s 0 (- l 1) ) ) ]<br>
>>>> > [else s]) ) )<br>
>>>> > _________________________________________________<br>
>>>> > For list-related administrative tasks:<br>
>>>> > <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br>
>>>><br>
>>>> _________________________________________________<br>
>>>> For list-related administrative tasks:<br>
>>>> <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br>
>>><br>
>><br>
>><br>
>> _________________________________________________<br>
>> For list-related administrative tasks:<br>
>> <a href="http://lists.racket-lang.org/listinfo/users" target="_blank">http://lists.racket-lang.org/listinfo/users</a><br>
>><br>
><br>
</div></div></blockquote></div></div></div><br>
</blockquote></div><br></div>