[racket] Regex golf: 25x slowdown compared to Python?!

From: Laurent (laurent.orseau at gmail.com)
Date: Thu Jun 26 12:19:51 EDT 2014

Hi,

While translating (almost literally) Norvig's awesome regex golf from
Python [1] to Racket [2], I'm facing a 25x slowdown (about 2s vs 50s). I've
run the optimization coach and made the obvious changes (mainly adding
`in-list` in `for` loops), tried to optimize `filter` and cache the regexps
in a hash (to avoid their recompilation, since I need to keep the raw
string too), but with no significant gain.

Reading the profile report, it seems that most of the time (99%) is spent
in `regexp-match`, so I wanted to write a stripped down version showing
that Python indeed is much faster than Racket on this particular function
but this test case shows almost the contrary! (25s vs 23s)

So I'm stumped and I have no idea how to cut this 25x slowdown. Does anyone
have an idea?

In [2], you can find:
- the regex golf file in Python (used for timing), which is almost
identical to Norvig's
- the translation in Racket
- the stress test of `re.search` in Python
- the stress test of `regexp-match?` in Racket

Timings can be found at the bottom of each file.

Note that the outputs of the programs are not exactly the same, because
sets are used in Python and lists in Racket, but unless there are bugs in
my code (which is not improbable) I don't think it matters a lot.

Thanks,
Laurent

[1] http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb
[2] https://gist.github.com/Metaxal/777be153b50a35a2618c
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20140626/59fbc3c9/attachment.html>

Posted on the users mailing list.