[racket] How can I speed up this code?

From: Ray Racine (ray.racine at gmail.com)
Date: Mon Jan 14 16:33:49 EST 2013

On #1.  The enticement is to give the json parser (or any other) an input
port which is essentially a pipe from the http connection to the content
parser, in this case Json.  If the parser (which in this case it is not)  a
streaming parser then one can save buffering the entire content in mem.
 Hence the avoidance of sucking the port into a string-port.

It _is_ interesting that in the test case under discussion it is a 100Meg
file.  (A single single object as well???)  So the fact that reading the
entire 100 meg, wrap it in a string port is 15% faster than just handing
the file port directly to the parser.   Mean peek/read char on a file port
measurably slower than from a string port.  I'm assuming the Racket runtime
internally buffers text file reading, so why the big difference in
performance?


On Mon, Jan 14, 2013 at 3:59 PM, Danny Yoo <dyoo at hashcollision.org> wrote:

> > From looking at the profile, we can trace that about 60% of the time
> > is being spent in... regexp-try-match!  That sounds really unusual:
> > lexing should not be the expensive part of this process...
> >
> > So perhaps it might be helpful to see if an alternative lexing
> > strategy (perhaps using parser-tools/lex) will perform better.
>
> Ok, I looked at the problem a little more.  It appears that there are
> a few stupid-simple optimizations to the JSON library that we can do.
> I've been able to cut down the time on my machine from an unoptimized
> run of 52 second to parse your file, to about 36 seconds.
>
> Here's the patch:
>
>
> https://github.com/dyoo/racket/commit/e8dc403217574754c57fa4bd95439abfb9b521ec
>
>
> I haven't pushed to master just because I'd like someone else to
> review the changes.  Also, I have not been able to find the unit tests
> for the json library.  Does anyone know where they are?
>
>
> Here's a summary of the changes.
>
> 1.  First, pull all the content of the input port into a string port.
> This cut down the runtime from 52 seconds to 45 seconds.  (15%
> improvement)
>
> 2.  Modified read-list so it avoids using regular expressions when
> simpler peek-char/read-char operations suffice.  Reduced the runtime
> from 45 seconds to 40 seconds.  (12% improvement)
>
> 3.  Looked at the profiler, which pointed out that read-string was
> very expensive.  Looked and found the regular expression:
>
>     rx"^(.*?)(\"|\\\\(.))"
>
> which is performance-hungry.  Replaced with a char-complement version
> to avoid the "?" part of the pattern:
>
>     #rx"^([^\"\\]*)(\"|\\\\(.))"
>
> which cut down the runtime from 40 seconds to 36 seconds.  (11%
> improvement)
>
>
> There still seems to be a lot of low-hanging fruit with regards to the
> use of regexp-try-catch, which is still taking 52% of the runtime,
> according to the profile here:
>
>      https://gist.github.com/4533369
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130114/66319e41/attachment.html>

Posted on the users mailing list.