[racket] How can I speed up this code?

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Tue Jan 15 18:18:26 EST 2013

Symbols are stored internally in utf-8, I believe.


On Tue, Jan 15, 2013 at 5:14 PM, Danny Yoo <dyoo at hashcollision.org> wrote:

> >> >> 1.  First, pull all the content of the input port into a string
> >> >>     port.  This cut down the runtime from 52 seconds to 45
> >> >>     seconds.  (15% improvement)
> >> >
> >> > I don't think that this is a good idea -- it looks lie a dangerous
> >> > assumption for a generic library to do, instead of letting users
> >> > decide for themselves if they want to do so and hand the port to
> >> > the library.
>
> Wow.   Ok, I see what you mean now, and yeah, my optimization here is
> unsound.  I did not know the JSON library behaved in a streaming
> manner.  Thanks!
>
>
>
> >> When I watch `top` and see how much memory's being used in the
> >> original code, I think this is a red herring, for the unoptimized
> >> json parser is already consuming around 500MB of ram on J G Cho's
> >> 92MB file during the parse.
> >
> > Is the *result* 500mb or the memory used while parsing?  If it's the
> > former, then that's not the consumption that is increased.  (BTW, if
> > most of it is made of strings, then we get the 4x UCS32 factor.)  If
> > it's the latter then I'm surprised.
>
> Yeah, the input JSON file is full of string literals from casual
> inspection, so I think you're right about the UCS32 explanation.  It's
> too bad; I had assumed that Racket used utf-8, since I've seen so many
> instances of bytes->string/utf-8 in Racket code.
>
>
>
> >> >> 2.  Modified read-list so it avoids using regular expressions when
> >> >> simpler peek-char/read-char operations suffice.  Reduced the runtime
> >> >> from 45 seconds to 40 seconds.  (12% improvement)
> >> >
> >> > This is a questionable change, IMO.  The thing is that keeping
> >> > things with regexps makes it easy to revise and modify in the
> >> > future, but switching to a single character thing makes it hard
> >> > and in addition requires the code to know when to use regexps and
> >> > when to use a character.  I prefer in this case the code
> >> > readability over performance.
>
> Ok, I'll abandon this specific patch for now.
>
> It sounds though that Ray Racine mentioned that his TR-ed version of
> the code performs faster than the non-TRed version?  Ray, do you have
> that version available somewhere to play with?
>
> ---
>
> I did push master with one change to the JSON library: the replacement
> of the non-greedy regexp with the char-complement version.  I also
> added several test cases to make sure I got it right.
>
> Thanks again for the review!
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130115/567feea6/attachment.html>

Posted on the users mailing list.