[racket] Regex golf: 25x slowdown compared to Python?!

From: Vincent St-Amour (stamourv at ccs.neu.edu)
Date: Sun Jun 29 14:41:49 EDT 2014

FYI, I just updated Optimization Coach to report when you pass
non-regexp-valued patterns to regexp functions.

Thanks for the report!

Vincent


At Thu, 26 Jun 2014 20:03:37 +0200,
Laurent wrote:
> 
> Oh woah! Thank you very much Matthew! What a relief :)
> 
> I thought I had tried that, but looks like I didn't... (my hash table was
> probably misplaced and your solution is neater anyway)
> 
> Laurent
> 
> 
> On Thu, Jun 26, 2014 at 6:35 PM, Matthew Flatt <mflatt at cs.utah.edu> wrote:
> 
> > Instead of passing a string directly as the first argument to
> > `regexp-match`, use `regexp` to convert the string to a regexp value if
> > you want to use it multiple times.
> >
> > Changing
> >
> >  (define (matches rx lstr)
> >    (filter (λ(w)(regexp-match? rx w)) lstr))
> >
> > to
> >
> >  (define (matches rx-str lstr)
> >    (define rx (regexp rx-str))
> >    (filter (λ(w)(regexp-match? rx w)) lstr))
> >
> > makes the "regexp-golf-racket.rkt" example take 3 seconds on my machine
> > instead of 50 seconds. Similarly, mapping `regexp` over the right-hand
> > side of `lrx` in "regex-timing.rkt" makes it take less than 1 second
> > instead of 25 seconds.
> >
> > I think Python similarly converts strings to regexps, but it internally
> > caches the conversion.
> >
> > At Thu, 26 Jun 2014 18:19:51 +0200, Laurent wrote:
> > > Hi,
> > >
> > > While translating (almost literally) Norvig's awesome regex golf from
> > > Python [1] to Racket [2], I'm facing a 25x slowdown (about 2s vs 50s).
> > I've
> > > run the optimization coach and made the obvious changes (mainly adding
> > > `in-list` in `for` loops), tried to optimize `filter` and cache the
> > regexps
> > > in a hash (to avoid their recompilation, since I need to keep the raw
> > > string too), but with no significant gain.
> > >
> > > Reading the profile report, it seems that most of the time (99%) is spent
> > > in `regexp-match`, so I wanted to write a stripped down version showing
> > > that Python indeed is much faster than Racket on this particular function
> > > but this test case shows almost the contrary! (25s vs 23s)
> > >
> > > So I'm stumped and I have no idea how to cut this 25x slowdown. Does
> > anyone
> > > have an idea?
> > >
> > > In [2], you can find:
> > > - the regex golf file in Python (used for timing), which is almost
> > > identical to Norvig's
> > > - the translation in Racket
> > > - the stress test of `re.search` in Python
> > > - the stress test of `regexp-match?` in Racket
> > >
> > > Timings can be found at the bottom of each file.
> > >
> > > Note that the outputs of the programs are not exactly the same, because
> > > sets are used in Python and lists in Racket, but unless there are bugs in
> > > my code (which is not improbable) I don't think it matters a lot.
> > >
> > > Thanks,
> > > Laurent
> > >
> > > [1] http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb
> > > [2] https://gist.github.com/Metaxal/777be153b50a35a2618c
> > > ____________________
> > >   Racket Users list:
> > >   http://lists.racket-lang.org/users
> >
> [1.2  <text/html; UTF-8 (quoted-printable)>]
> 
> [2  <text/plain; us-ascii (7bit)>]
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users


Posted on the users mailing list.