[racket] How can I speed up this code?
On #1. The enticement is to give the json parser (or any other) an input
port which is essentially a pipe from the http connection to the content
parser, in this case Json. If the parser (which in this case it is not) a
streaming parser then one can save buffering the entire content in mem.
Hence the avoidance of sucking the port into a string-port.
It _is_ interesting that in the test case under discussion it is a 100Meg
file. (A single single object as well???) So the fact that reading the
entire 100 meg, wrap it in a string port is 15% faster than just handing
the file port directly to the parser. Mean peek/read char on a file port
measurably slower than from a string port. I'm assuming the Racket runtime
internally buffers text file reading, so why the big difference in
performance?
On Mon, Jan 14, 2013 at 3:59 PM, Danny Yoo <dyoo at hashcollision.org> wrote:
> > From looking at the profile, we can trace that about 60% of the time
> > is being spent in... regexp-try-match! That sounds really unusual:
> > lexing should not be the expensive part of this process...
> >
> > So perhaps it might be helpful to see if an alternative lexing
> > strategy (perhaps using parser-tools/lex) will perform better.
>
> Ok, I looked at the problem a little more. It appears that there are
> a few stupid-simple optimizations to the JSON library that we can do.
> I've been able to cut down the time on my machine from an unoptimized
> run of 52 second to parse your file, to about 36 seconds.
>
> Here's the patch:
>
>
> https://github.com/dyoo/racket/commit/e8dc403217574754c57fa4bd95439abfb9b521ec
>
>
> I haven't pushed to master just because I'd like someone else to
> review the changes. Also, I have not been able to find the unit tests
> for the json library. Does anyone know where they are?
>
>
> Here's a summary of the changes.
>
> 1. First, pull all the content of the input port into a string port.
> This cut down the runtime from 52 seconds to 45 seconds. (15%
> improvement)
>
> 2. Modified read-list so it avoids using regular expressions when
> simpler peek-char/read-char operations suffice. Reduced the runtime
> from 45 seconds to 40 seconds. (12% improvement)
>
> 3. Looked at the profiler, which pointed out that read-string was
> very expensive. Looked and found the regular expression:
>
> rx"^(.*?)(\"|\\\\(.))"
>
> which is performance-hungry. Replaced with a char-complement version
> to avoid the "?" part of the pattern:
>
> #rx"^([^\"\\]*)(\"|\\\\(.))"
>
> which cut down the runtime from 40 seconds to 36 seconds. (11%
> improvement)
>
>
> There still seems to be a lot of low-hanging fruit with regards to the
> use of regexp-try-catch, which is still taking 52% of the runtime,
> according to the profile here:
>
> https://gist.github.com/4533369
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130114/66319e41/attachment.html>