<div dir="ltr"><div><div>Oh woah! Thank you very much Matthew! What a relief :)<br><br></div>I thought I had tried that, but looks like I didn't... (my hash table was probably misplaced and your solution is neater anyway)<br>
<br></div>Laurent<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jun 26, 2014 at 6:35 PM, Matthew Flatt <span dir="ltr"><<a href="mailto:mflatt@cs.utah.edu" target="_blank">mflatt@cs.utah.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Instead of passing a string directly as the first argument to<br>
`regexp-match`, use `regexp` to convert the string to a regexp value if<br>
you want to use it multiple times.<br>
<br>
Changing<br>
<br>
(define (matches rx lstr)<br>
(filter (λ(w)(regexp-match? rx w)) lstr))<br>
<br>
to<br>
<br>
(define (matches rx-str lstr)<br>
(define rx (regexp rx-str))<br>
(filter (λ(w)(regexp-match? rx w)) lstr))<br>
<br>
makes the "regexp-golf-racket.rkt" example take 3 seconds on my machine<br>
instead of 50 seconds. Similarly, mapping `regexp` over the right-hand<br>
side of `lrx` in "regex-timing.rkt" makes it take less than 1 second<br>
instead of 25 seconds.<br>
<br>
I think Python similarly converts strings to regexps, but it internally<br>
caches the conversion.<br>
<div><div class="h5"><br>
At Thu, 26 Jun 2014 18:19:51 +0200, Laurent wrote:<br>
> Hi,<br>
><br>
> While translating (almost literally) Norvig's awesome regex golf from<br>
> Python [1] to Racket [2], I'm facing a 25x slowdown (about 2s vs 50s). I've<br>
> run the optimization coach and made the obvious changes (mainly adding<br>
> `in-list` in `for` loops), tried to optimize `filter` and cache the regexps<br>
> in a hash (to avoid their recompilation, since I need to keep the raw<br>
> string too), but with no significant gain.<br>
><br>
> Reading the profile report, it seems that most of the time (99%) is spent<br>
> in `regexp-match`, so I wanted to write a stripped down version showing<br>
> that Python indeed is much faster than Racket on this particular function<br>
> but this test case shows almost the contrary! (25s vs 23s)<br>
><br>
> So I'm stumped and I have no idea how to cut this 25x slowdown. Does anyone<br>
> have an idea?<br>
><br>
> In [2], you can find:<br>
> - the regex golf file in Python (used for timing), which is almost<br>
> identical to Norvig's<br>
> - the translation in Racket<br>
> - the stress test of `re.search` in Python<br>
> - the stress test of `regexp-match?` in Racket<br>
><br>
> Timings can be found at the bottom of each file.<br>
><br>
> Note that the outputs of the programs are not exactly the same, because<br>
> sets are used in Python and lists in Racket, but unless there are bugs in<br>
> my code (which is not improbable) I don't think it matters a lot.<br>
><br>
> Thanks,<br>
> Laurent<br>
><br>
> [1] <a href="http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb" target="_blank">http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb</a><br>
> [2] <a href="https://gist.github.com/Metaxal/777be153b50a35a2618c" target="_blank">https://gist.github.com/Metaxal/777be153b50a35a2618c</a><br>
</div></div>> ____________________<br>
> Racket Users list:<br>
> <a href="http://lists.racket-lang.org/users" target="_blank">http://lists.racket-lang.org/users</a><br>
</blockquote></div><br></div>