[racket] string-strip

From: Marijn (hkBst at gentoo.org)
Date: Thu Dec 29 04:02:03 EST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28-12-11 19:27, Neil Van Dyke wrote:
> Marijn wrote at 12/28/2011 12:00 PM:
>> I don't think my use of this code is very performance, but I
>> couldn't help myself, so I looked into making it faster
> 
> This is the best spirit. :)
> 
>> What I found was that it is much slower to treat a string as a
>> port and then read-char from that then it is to directly index 
>> the string.
> 
> That string input ports are often noticeably slower than string
> indexing is one of the banes of my existence.  Most reading and
> parsing operations you implement, you want to work on both ports
> and strings. But, if you first write a procedure that works on a
> port, and then write a wrapper procedure that works on a string (by
> doing an "open-input-string" and calling your procedure that works
> on ports), the string one can be noticeably slower than if you'd
> handwritten the string one.  But having to write two separate
> procedures has big development cost, and I always just take the
> performance hit on strings instead, or write a string procedure and
> then not have a port procedure when I need it later.  One approach
> that might help is to design a macro that lets people define
> processing on strings and ports, and expands to produce two closure
> definitions -- one that works on ports, and one on strings, and
> avoids a lot of port-related overhead in the string one.

Matthew, any comments on this? Is there a fundamental reason that
treating a string as a port is so much slower than direct indexing or
is there something that can be done about it? Or should we look into
automatically duplicating code with macros?

Marijn

>> In the end I was able to construct code that is another factor
>> 5.5 faster than your version:
>> 
> 
> Marijn, that's a great implementation.
> 
> And it's encouraging to see that forgoing regexps and
> "string->number", and doing a character-by-character DFA, is faster
> in this case.  It's pretty common in interpreted languages to use
> regexps for performance reasons; nice when we see that pure Racket
> code can be faster.
> 
> One tangential comment: I don't think it's significant in this 
> situation, but I believe that "case" is currently slower than,
> say, people coming from a C background might think.  So, sometimes,
> if one really wants to micro-optimize, one sometimes might be
> better off doing, say strategic "if"s and arithmetic, instead of
> "case".
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk78LIsACgkQp/VmCx0OL2x7hQCfb77YNdgro1gKb3hhUxYQ+za7
hfAAnRwlJ2qdTOCZbNuyZvFZw34oDebI
=AyrQ
-----END PGP SIGNATURE-----


Posted on the users mailing list.