[racket] string-strip

From: Neil Van Dyke (neil at neilvandyke.org)
Date: Wed Dec 28 13:27:05 EST 2011

Marijn wrote at 12/28/2011 12:00 PM:
> I don't think my use of this code is very performance, but I couldn't
> help myself, so I looked into making it faster

This is the best spirit. :)

> What I found was that it is much slower to treat a string
> as a port and then read-char from that then it is to directly index
> the string.

That string input ports are often noticeably slower than string indexing 
is one of the banes of my existence.  Most reading and parsing 
operations you implement, you want to work on both ports and strings.  
But, if you first write a procedure that works on a port, and then write 
a wrapper procedure that works on a string (by doing an 
"open-input-string" and calling your procedure that works on ports), the 
string one can be noticeably slower than if you'd handwritten the string 
one.  But having to write two separate procedures has big development 
cost, and I always just take the performance hit on strings instead, or 
write a string procedure and then not have a port procedure when I need 
it later.  One approach that might help is to design a macro that lets 
people define processing on strings and ports, and expands to produce 
two closure definitions -- one that works on ports, and one on strings, 
and avoids a lot of port-related overhead in the string one.

> In the end I was able to construct code that is another factor 5.5
> faster than your version:

Marijn, that's a great implementation.

And it's encouraging to see that forgoing regexps and "string->number", 
and doing a character-by-character DFA, is faster in this case.  It's 
pretty common in interpreted languages to use regexps for performance 
reasons; nice when we see that pure Racket code can be faster.

One tangential comment: I don't think it's significant in this 
situation, but I believe that "case" is currently slower than, say, 
people coming from a C background might think.  So, sometimes, if one 
really wants to micro-optimize, one sometimes might be better off doing, 
say strategic "if"s and arithmetic, instead of "case".


Posted on the users mailing list.