[racket] How can I speed up this code?

From: Danny Yoo (dyoo at hashcollision.org)
Date: Mon Jan 14 15:59:17 EST 2013

> From looking at the profile, we can trace that about 60% of the time
> is being spent in... regexp-try-match!  That sounds really unusual:
> lexing should not be the expensive part of this process...
>
> So perhaps it might be helpful to see if an alternative lexing
> strategy (perhaps using parser-tools/lex) will perform better.

Ok, I looked at the problem a little more.  It appears that there are
a few stupid-simple optimizations to the JSON library that we can do.
I've been able to cut down the time on my machine from an unoptimized
run of 52 second to parse your file, to about 36 seconds.

Here's the patch:

https://github.com/dyoo/racket/commit/e8dc403217574754c57fa4bd95439abfb9b521ec


I haven't pushed to master just because I'd like someone else to
review the changes.  Also, I have not been able to find the unit tests
for the json library.  Does anyone know where they are?


Here's a summary of the changes.

1.  First, pull all the content of the input port into a string port.
This cut down the runtime from 52 seconds to 45 seconds.  (15%
improvement)

2.  Modified read-list so it avoids using regular expressions when
simpler peek-char/read-char operations suffice.  Reduced the runtime
from 45 seconds to 40 seconds.  (12% improvement)

3.  Looked at the profiler, which pointed out that read-string was
very expensive.  Looked and found the regular expression:

    rx"^(.*?)(\"|\\\\(.))"

which is performance-hungry.  Replaced with a char-complement version
to avoid the "?" part of the pattern:

    #rx"^([^\"\\]*)(\"|\\\\(.))"

which cut down the runtime from 40 seconds to 36 seconds.  (11% improvement)


There still seems to be a lot of low-hanging fruit with regards to the
use of regexp-try-catch, which is still taking 52% of the runtime,
according to the profile here:

     https://gist.github.com/4533369

Posted on the users mailing list.