[racket] How can I speed up this code?

From: Eli Barzilay (eli at barzilay.org)
Date: Mon Jan 14 16:29:40 EST 2013

Just now, Danny Yoo wrote:
> On Mon, Jan 14, 2013 at 2:07 PM, Eli Barzilay <eli at barzilay.org> wrote:
> > Just now, Danny Yoo wrote:
> >> Also, I have not been able to find the unit tests for the json
> >> library.  Does anyone know where they are?
> >
> > They're in the "tests" subdirectory.
> 
> Uh... can you be more specific?  I searched for 'json', and the only
> hits I'm seeing are these:

...racket/collects/json/tests


> >> 1.  First, pull all the content of the input port into a string
> >>     port.  This cut down the runtime from 52 seconds to 45
> >>     seconds.  (15% improvement)
> >
> > I don't think that this is a good idea -- it looks lie a dangerous
> > assumption for a generic library to do, instead of letting users
> > decide for themselves if they want to do so and hand the port to
> > the library.
> 
> I am confused and don't understand this particular objection yet.
> Can you explain a little more?  The side effect of read-json on the
> input port is that it completely exhausts it.  What freedom does the
> patch eliminate with regards to port usage?
> 
> I understand Sam's objection based on memory,

It's the same one.  Add to that also the possibility of a "streaming"
thing later on.  Then add to that the fact that parsing a json might
not consume the whole port, which can be very usefukl in some case
(IIRC, there's also a test for that).


> but I suspect the extra memory usage is roughly proportional to the
> size of the JSON object being constructed.

Yeah, twice as much, roughly.


> When I watch `top` and see how much memory's being used in the
> original code, I think this is a red herring, for the unoptimized
> json parser is already consuming around 500MB of ram on J G Cho's
> 92MB file during the parse.

Is the *result* 500mb or the memory used while parsing?  If it's the
former, then that's not the consumption that is increased.  (BTW, if
most of it is made of strings, then we get the 4x UCS32 factor.)  If
it's the latter then I'm surprised.


> >> 2.  Modified read-list so it avoids using regular expressions when
> >> simpler peek-char/read-char operations suffice.  Reduced the runtime
> >> from 45 seconds to 40 seconds.  (12% improvement)
> >
> > This is a questionable change, IMO.  The thing is that keeping
> > things with regexps makes it easy to revise and modify in the
> > future, but switching to a single character thing makes it hard
> > and in addition requires the code to know when to use regexps and
> > when to use a character.  I prefer in this case the code
> > readability over performance.
> 
> Is it likely for the JSON standard to change, though?  That function
> is such a hot-spot that hits everything else.

I don't know how likely it is, but why commit such a new thing (code +
"standard" + use) for something that can change?

-- 
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

Posted on the users mailing list.