[racket] Performance help

From: Gustavo Massaccesi (gustavo at oma.org.ar)
Date: Sun Jan 4 07:52:44 EST 2015

I've just submitted a pull request with two changes
https://github.com/jmoy/norvig-spell/pull/3 . In my machine, these
changes reduce the time of the tests from 17.7 seconds, to 16.2
seconds, and finally to 14.5 seconds.

The original code is something like:

  (define ALPHABET (in-list (string->list "abc...xyz")))
  (for/list ([c ALPHABET]))
    (string-append "?" (string c) "?"))

The first problem is that outside a clause of a `for` the `in-list`
creates a sequence. But inside a clause of a `for` the `in-list` is
spliced and generates more efficient code.

  (define ALPHABET (string->list "abc...xyz"))
  (for/list ([c (in-list ALPHABET)]))
    (string-append "?" (string c) "?"))

The second problem is that for each word `(string c)` creates a new
temporary string, so I mapped `string` at the definition site.

  (define ALPHABET (map string (string->list "abc...xyz")))
  (for/list ([c (in-list ALPHABET)]))
    (string-append "?" c "?"))

---

Following the idea of the second change I tried to remove more of the
temporary strings in the code ( for example, `(substring x 1)` looks
like a bad idea). But my tries were unsuccessful to make the code
faster :(.

I also read the code in the other language, and if I understood
correctly they are using strings that only contain "ascii" characters
instead of strings for "unicode" characters. I think the closest
equivalent in Racket is to use `bytes` instead of `strings`. It should
be slightly faster, because they are easier to handle at the low
level. I tried this and a few tricks, but the improvement was minimal
(something like a reduction from 14.5 seconds to ~14 seconds?).

Gustavo


On Fri, Jan 2, 2015 at 11:18 PM, Matthias Felleisen
<matthias at ccs.neu.edu> wrote:
>
> On Jan 2, 2015, at 11:38 AM, Jens Axel Søgaard wrote:
>
>> 2015-01-02 17:10 GMT+01:00 Matthias Felleisen <matthias at ccs.neu.edu>:
>>>
>>> Jens -- thanks for re-directing us to your article. This saved me quite
>>> some time. I noticed the lacking abstractions (slicing, comprehension)
>>> and considered sketching a prototype to use with this program. I decided
>>> to wait till today because it didn't address what bothered me most:
>>>
>>> speed.
>>
>> One trick: change the regular expression in  words  in order to
>> get rid of the call to string-downcase:
>>
>>   (regexp-split #rx"[^a-zA-Z]" buf)
>>
>> Another trick: the regular expression matchers in Racket works both
>> on strings and ports, so  train  can be written
>>
>>    (define (train fname)
>>       (freqs (words (open-input-file fname))))
>>
>> I am not sure whether it will give a speedup though.
>
>
> Both of these caused a serious slowdown. -- Matthias
>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users


Posted on the users mailing list.