[racket] How can I speed up this code?
>> >> 1. First, pull all the content of the input port into a string
>> >> port. This cut down the runtime from 52 seconds to 45
>> >> seconds. (15% improvement)
>> >
>> > I don't think that this is a good idea -- it looks lie a dangerous
>> > assumption for a generic library to do, instead of letting users
>> > decide for themselves if they want to do so and hand the port to
>> > the library.
Wow. Ok, I see what you mean now, and yeah, my optimization here is
unsound. I did not know the JSON library behaved in a streaming
manner. Thanks!
>> When I watch `top` and see how much memory's being used in the
>> original code, I think this is a red herring, for the unoptimized
>> json parser is already consuming around 500MB of ram on J G Cho's
>> 92MB file during the parse.
>
> Is the *result* 500mb or the memory used while parsing? If it's the
> former, then that's not the consumption that is increased. (BTW, if
> most of it is made of strings, then we get the 4x UCS32 factor.) If
> it's the latter then I'm surprised.
Yeah, the input JSON file is full of string literals from casual
inspection, so I think you're right about the UCS32 explanation. It's
too bad; I had assumed that Racket used utf-8, since I've seen so many
instances of bytes->string/utf-8 in Racket code.
>> >> 2. Modified read-list so it avoids using regular expressions when
>> >> simpler peek-char/read-char operations suffice. Reduced the runtime
>> >> from 45 seconds to 40 seconds. (12% improvement)
>> >
>> > This is a questionable change, IMO. The thing is that keeping
>> > things with regexps makes it easy to revise and modify in the
>> > future, but switching to a single character thing makes it hard
>> > and in addition requires the code to know when to use regexps and
>> > when to use a character. I prefer in this case the code
>> > readability over performance.
Ok, I'll abandon this specific patch for now.
It sounds though that Ray Racine mentioned that his TR-ed version of
the code performs faster than the non-TRed version? Ray, do you have
that version available somewhere to play with?
---
I did push master with one change to the JSON library: the replacement
of the non-greedy regexp with the char-complement version. I also
added several test cases to make sure I got it right.
Thanks again for the review!