[racket] How can I speed up this code?

From: Danny Yoo (dyoo at hashcollision.org)
Date: Mon Jan 14 16:21:52 EST 2013

On Mon, Jan 14, 2013 at 2:07 PM, Eli Barzilay <eli at barzilay.org> wrote:
> Just now, Danny Yoo wrote:
>> Also, I have not been able to find the unit tests for the json
>> library.  Does anyone know where they are?
>
> They're in the "tests" subdirectory.

Uh... can you be more specific?  I searched for 'json', and the only
hits I'm seeing are these:

128-110-69-116:tests dyoo$ grep -r 'json' *
db/db/sql-types.rkt:      (type-test-case '(json)
net/mime.rkt:Content-Type: application/json
net/mime.rkt:        (entity-subtype sub) => 'json
xrepl/xrepl.rkt:  -> «,r (only-in json string->jsexpr)» ⇒ works with
an expression
xrepl/xrepl.rkt:  -> «,en json»
xrepl/xrepl.rkt:  json/main> «,sh echo $F»
xrepl/xrepl.rkt:  @|collects|/json/main.rkt
xrepl/xrepl.rkt:  json/main> «,top»


so I truthfully do not know where I should be looking.  Help!  :)



>> 1.  First, pull all the content of the input port into a string
>>     port.  This cut down the runtime from 52 seconds to 45 seconds.
>>     (15% improvement)
>
> I don't think that this is a good idea -- it looks lie a dangerous
> assumption for a generic library to do, instead of letting users
> decide for themselves if they want to do so and hand the port to the
> library.

I am confused and don't understand this particular objection yet.  Can
you explain a little more?  The side effect of read-json on the input
port is that it completely exhausts it.  What freedom does the patch
eliminate with regards to port usage?


I understand Sam's objection based on memory, but I suspect the extra
memory usage is roughly proportional to the size of the JSON object
being constructed.  When I watch `top` and see how much memory's being
used in the original code, I think this is a red herring, for the
unoptimized json parser is already consuming around 500MB of ram on J
G Cho's 92MB file during the parse.  If we really cared about RAM
usage, we should look at other places first.




>> 2.  Modified read-list so it avoids using regular expressions when
>> simpler peek-char/read-char operations suffice.  Reduced the runtime
>> from 45 seconds to 40 seconds.  (12% improvement)
>
> This is a questionable change, IMO.  The thing is that keeping things
> with regexps makes it easy to revise and modify in the future, but
> switching to a single character thing makes it hard and in addition
> requires the code to know when to use regexps and when to use a
> character.  I prefer in this case the code readability over
> performance.

Is it likely for the JSON standard to change, though?   That function
is such a hot-spot that hits everything else.



> I was hoping that you'll get a more performant code by using one of
> the parsers instead, which would also be as manageable as regexps (or
> maybe even more).

Yes, but I tried initial experiments using parser-tools/lex, and I
could not measure an improvement.  I may have made a mistake, though.
I can try again when I have more time.



>> which is performance-hungry.  Replaced with a char-complement
>> version to avoid the "?" part of the pattern:
>>
[cut]
>
> This looks fine, but I'll feel better if it comes with tests.


Yes, I heartily agree on this.


Posted on the users mailing list.