[racket] How can I speed up this code?
On Mon, Jan 14, 2013 at 2:07 PM, Eli Barzilay <eli at barzilay.org> wrote:
> Just now, Danny Yoo wrote:
>> Also, I have not been able to find the unit tests for the json
>> library. Does anyone know where they are?
>
> They're in the "tests" subdirectory.
Uh... can you be more specific? I searched for 'json', and the only
hits I'm seeing are these:
128-110-69-116:tests dyoo$ grep -r 'json' *
db/db/sql-types.rkt: (type-test-case '(json)
net/mime.rkt:Content-Type: application/json
net/mime.rkt: (entity-subtype sub) => 'json
xrepl/xrepl.rkt: -> «,r (only-in json string->jsexpr)» ⇒ works with
an expression
xrepl/xrepl.rkt: -> «,en json»
xrepl/xrepl.rkt: json/main> «,sh echo $F»
xrepl/xrepl.rkt: @|collects|/json/main.rkt
xrepl/xrepl.rkt: json/main> «,top»
so I truthfully do not know where I should be looking. Help! :)
>> 1. First, pull all the content of the input port into a string
>> port. This cut down the runtime from 52 seconds to 45 seconds.
>> (15% improvement)
>
> I don't think that this is a good idea -- it looks lie a dangerous
> assumption for a generic library to do, instead of letting users
> decide for themselves if they want to do so and hand the port to the
> library.
I am confused and don't understand this particular objection yet. Can
you explain a little more? The side effect of read-json on the input
port is that it completely exhausts it. What freedom does the patch
eliminate with regards to port usage?
I understand Sam's objection based on memory, but I suspect the extra
memory usage is roughly proportional to the size of the JSON object
being constructed. When I watch `top` and see how much memory's being
used in the original code, I think this is a red herring, for the
unoptimized json parser is already consuming around 500MB of ram on J
G Cho's 92MB file during the parse. If we really cared about RAM
usage, we should look at other places first.
>> 2. Modified read-list so it avoids using regular expressions when
>> simpler peek-char/read-char operations suffice. Reduced the runtime
>> from 45 seconds to 40 seconds. (12% improvement)
>
> This is a questionable change, IMO. The thing is that keeping things
> with regexps makes it easy to revise and modify in the future, but
> switching to a single character thing makes it hard and in addition
> requires the code to know when to use regexps and when to use a
> character. I prefer in this case the code readability over
> performance.
Is it likely for the JSON standard to change, though? That function
is such a hot-spot that hits everything else.
> I was hoping that you'll get a more performant code by using one of
> the parsers instead, which would also be as manageable as regexps (or
> maybe even more).
Yes, but I tried initial experiments using parser-tools/lex, and I
could not measure an improvement. I may have made a mistake, though.
I can try again when I have more time.
>> which is performance-hungry. Replaced with a char-complement
>> version to avoid the "?" part of the pattern:
>>
[cut]
>
> This looks fine, but I'll feel better if it comes with tests.
Yes, I heartily agree on this.