[racket] Places performance

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Thu Mar 14 22:25:19 EDT 2013

On Thu, Mar 14, 2013 at 9:19 PM, Harry Spier <vasishtha.spier at gmail.com>wrote:

> Thanks Robby and Tobias.
>
> Robby said:
> --------------------
> You're doing way more work in the timed portion of the place version than
> you are in the non-place one. In the places one you're creating the
> test-vector 3 times, once in the original place (that you don't explicitly
> create) and once in each of the places that you create. Your code is also
> copying the vector from one place to another in the places one (ignoring
> the one that was created when the place was created).
> --------------------
> Isn't there always going to be this kind of overhead involved when using
> places because of the rerequire of the original module in each place
> created?
>
>
Well, the module system is pretty flexible, so you should be able to
arrange your modules not to do that.


> I checked the timing of creating test-vector and its relatively small but
> most of the overhead appears to be in communicating test-vector to the
> places
>
> Also when I change my code to include the extra place-channel-gets and
> place-channel-puts Tobias suggested  I get almost exactly the same timings
> as without these extra place-channel-gets and puts
> I.e.
> test-vector-size 5000000
> With places cpu time: 13993 real time: 9381 gc time: 763
> Without places cpu time: 7472 real time: 7804 gc time: 4334
>
> But  a lot of the overhead appears to be in the communicating of
> test-vector to the places.
> When instead of: (place-channel-put ch (test-function (place-channel-get
> ch)))
> I put: (place-channel-put ch (test-function test-vector))
> then I get timings of:
> test-vector-size 5000000
> With places cpu time: 10218 real time: 5623 gc time: 0
> Without places cpu time: 7613 real time: 7820 gc time: 4492
>
>
Yeah, I'm not sure about that. Probably the allocation of big, simple
vectors like that has been optimized, but passing them over place channels
hasn't. I'm sorry I can't help more here.


>
> Is this the way "places" work.
> When a racket program executes a module that contains a place,  it:
> 1)executes the code in the module until it comes to the place form
> 2) It then creates a new racket instance (a place) containing a new module.
> 3) That new module in the new racket instance requires the original module
> containing the place.
> 4) The body of the place form is then executed in the new racket instance
> (the place)
> 5) Simultaneously the original module in the original racket instance
> continues executing.
> 6)The original and the new module (the two racket instances) communicate
> via place-channels
>
>
Something like that, but I prefer to think of it more like how it is
documented in the explanation of 'place'.

(You may find dynamic-place more useful for larger examples.)


> So in effect in my code I've created 3 racket instances (3 places)
> executing on a 2 core machine.
> If I change my code so I'm executing 2 places instead of 3, I would have
> thought that would improve the timings, but that doesn't appear to be the
> case.
>
>
Well, when a place isn't busy, then it won't take time. In general, it is
okay to have more places than those actually doing work (maybe not 1000s
more, there is a limit somewhere between 1 extra and 1000 extra :).


> The following code (2 places only, the original Racket instance and one
> place) takes more real-time than my original code (3 places).  (8607
> instead of 5623)
>
> I'm not clear why that should be the case?
>
> The timings are:
> test-vector-size 5000000
> With places cpu time: 8097 real time: 8607 gc time: 1856
> Without places cpu time: 7238 real time: 7476 gc time: 4024
>
> for the code:
> -------------------------------
> place-timing-test5-execute.rkt
> -------------------------------
> #lang racket
> (require "place-timing-test5.rkt")
>
> (printf "test-vector-size ~a" test-vector-size)
> (newline)
> (display "With places ")
> (time (places-main))
> (display "Without places ")
> (time (noplaces-main))
>
> (system "PAUSE")
>
> ------------------------------------
>  place-timing-test5-execute.rkt
> -------------------------------------
> #lang racket
>
> (provide places-main noplaces-main test-vector-size)
> (define test-vector (build-vector 5000000 +))
> (define test-vector-size (vector-length test-vector))
> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
> vectr))))
>
>
> (define (places-main)
>     (test-function test-vector)
>
>     (define place1
>      (place ch
>       (begin
>         (place-channel-put ch #t)
>          (place-channel-put ch (test-function test-vector)))))
>
>     (place-channel-get place1)
>     (place-channel-get place1)
>     (place-wait place1)
>
>     (void))
>
> (define (noplaces-main)
>     (test-function test-vector)
>     (test-function test-vector)
>     (void))
> ------------------
>
> VERSUS
>
> test-vector-size 5000000
> With places cpu time: 10218 real time: 5623 gc time: 0
> Without places cpu time: 7613 real time: 7820 gc time: 4492
>
> -------------------------------------
> place-timing-test-execute.rkt
> -------------------------------------
> #lang racket
> (require "place-timing-test.rkt")
> (printf "test-vector-size ~a" test-vector-size)
> (newline)
> (display "With places ")
> (time (places-main))
> (display "Without places ")
> (time (noplaces-main))
>
> (system "PAUSE")
>
>
> ------------------------
> place-timing-test.rkt
> ------------------------
> #lang racket
>
> (provide places-main noplaces-main test-vector-size)
> (define test-vector (build-vector 5000000 +))
> (define test-vector-size (vector-length test-vector))
> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
> vectr))))
>
>
> (define (places-main)
>     (define place1
>     (place ch
>       (begin
>         (place-channel-put ch #t)
>         (place-channel-put ch (test-function test-vector)))))
>
>
>     (define place2
>       (place ch
>        (begin
>          (place-channel-put ch #t)
>          (place-channel-put ch (test-function test-vector)))))
>
>     (place-channel-get place1)
>     (place-channel-get place2)
>
>
>
>     (place-channel-get place1)
>     (place-channel-get place2)
>
>     (place-wait place1)
>     (place-wait place2)
>     (void))
>
> (define (noplaces-main)
>     (test-function test-vector)
>     (test-function test-vector)
>     (void))
> --------------------
>
>
>
> On Thu, Mar 14, 2013 at 8:28 AM, Tobias Hammer <tobias.hammer at dlr.de>wrote:
>
>> As Robby said, the time should only include both place-channel-get and
>> -put calls to make it comparable.
>> But this is not enough, because the (place ...)-creation seems to be
>> non-blocking, i.e they might not be started up yet when the codes reached
>> the first -put.
>>
>> This can be solved by explicitly synchronizing the places with the main
>> program:
>>
>>
>> (define (places-main)
>>     (define place1
>>        (place ch
>>               (begin
>>                 (place-channel-put ch #t)
>>                 (place-channel-put ch (test-function (place-channel-get
>> ch))))))
>>
>>     (define place2
>>         (place ch
>>                (begin
>>                  (place-channel-put ch #t)
>>                  (place-channel-put ch (test-function (place-channel-get
>> ch))))))
>>
>>     (place-channel-get place1)
>>     (place-channel-get place2)
>>
>>
>>     (display "With places ")
>>     (time
>>      (place-channel-put place1 test-vector)
>>      (place-channel-put place2 test-vector)
>>      (place-channel-get place1)
>>      (place-channel-get place2))
>>
>>     (place-wait place1)
>>     (place-wait place2)
>>     (void))
>>
>>
>> With this i get the following times
>>
>> With places cpu time: 5600 real time: 3199 gc time: 404
>> Without places cpu time: 3700 real time: 3678 gc time: 2208
>>
>> Now its at least faster than the sequential version. But the overhead
>> seems to be still a lot more than i had expected.
>>
>> Tobias
>>
>>
>>
>>
>> On Thu, 14 Mar 2013 02:23:22 +0100, Harry Spier <
>> vasishtha.spier at gmail.com> wrote:
>>
>>  Dear members,
>>>
>>> I've run the following racket program  (as an executable) to test the
>>> performance of places on a windows dual core pentium machine under Vista.
>>>  I've run this with various sizes for test-vector.  Even when the amount
>>> of
>>> computation is large (test-vector-size = 5000000) the performance with
>>> the
>>> computation split over two places takes more than double the time to
>>> complete as when no places are used.
>>>
>>> I'm not clear why on a dual core machine the performance wasn't better
>>> with
>>>  the computation split over two places than with no places. In fact the
>>> results are the opposite with the performance for no places always double
>>> that for two places.
>>>
>>> --------------------------------------
>>> place-timing-test-executable.rkt
>>> ---------------------------------------
>>> #lang racket
>>> (require "place-timing-test.rkt")
>>> (printf "test-vector-size ~a" test-vector-size)
>>> (newline)
>>> (display "With places ")
>>> (time (places-main))
>>> (display "Without places ")
>>> (time (noplaces-main))
>>>
>>> (system "PAUSE")
>>>
>>>
>>> -----------------------
>>> place-timing-test.rkt
>>> -----------------------
>>> #lang racket
>>> (provide places-main noplaces-main test-vector-size)
>>> (define test-vector (build-vector 5000000 +))
>>> (define test-vector-size (vector-length test-vector))
>>> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
>>> vectr))))
>>>
>>> (define (places-main)
>>>     (define place1
>>>        (place ch (place-channel-put ch (test-function (place-channel-get
>>> ch)))))
>>>
>>>     (define place2
>>>         (place ch (place-channel-put ch (test-function (place-channel-get
>>> ch)))))
>>>
>>>     (place-channel-put place1 test-vector)
>>>     (place-channel-put place2 test-vector)
>>>     (place-channel-get place1)
>>>     (place-channel-get place2)
>>>     (place-wait place1)
>>>     (place-wait place2)
>>>     (void))
>>>
>>> (define (noplaces-main)
>>>     (test-function test-vector)
>>>     (test-function test-vector)
>>>     (void)
>>> -------------------------------
>>>
>>> These are the results for different sizes of test-vector.  The amount of
>>> computation is linear to the size of test-vector.
>>>
>>> test-vector-size 100000
>>> With places cpu time: 1685 real time: 856 gc time: 0
>>> Without places cpu time: 187 real time: 170 gc time: 94
>>> Press any key to continue . . .
>>> -----
>>> test-vector-size 1000000
>>> With places cpu time: 3822 real time: 2637 gc time: 265
>>> Without places cpu time: 1201 real time: 1191 gc time: 452
>>> Press any key to continue . . .
>>> -------
>>> test-vector-size 5000000
>>> With places cpu time: 15787 real time: 23318 gc time: 1373
>>> Without places cpu time: 7769 real time: 9456 gc time: 4461
>>> Press any key to continue . . .
>>> ------
>>>
>>> Thanks,
>>> Harry Spier
>>>
>>
>>
>> --
>> ---------------------------------------------------------
>> Tobias Hammer
>> DLR / Institute of Robotics and Mechatronics
>> Muenchner Str. 20, D-82234 Wessling
>> Tel.: 08153/28-1487
>> Mail: tobias.hammer at dlr.de
>>
>
>
> ____________________
>   Racket Users list:
>   http://lists.racket-lang.org/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130314/951d09e6/attachment-0001.html>

Posted on the users mailing list.