[racket] Places performance

From: Tobias Hammer (tobias.hammer at dlr.de)
Date: Fri Mar 15 03:31:00 EDT 2013

I see that you left the placement of (time ...) unchanged in your  
with-places code. You should really exclude the (place ..)-call from your  
measurement.
It is not fair to compare a simple loop vs. the creation of two full  
racket instances + the loop. See my last mail for a 'fairer' version.

In a real program you should recycle places to create them only once (e.g  
one per core) and then send them new work items via channels.

Tobias



On Fri, 15 Mar 2013 03:19:07 +0100, Harry Spier  
<vasishtha.spier at gmail.com> wrote:

> Thanks Robby and Tobias.
>
> Robby said:
> --------------------
> You're doing way more work in the timed portion of the place version than
> you are in the non-place one. In the places one you're creating the
> test-vector 3 times, once in the original place (that you don't  
> explicitly
> create) and once in each of the places that you create. Your code is also
> copying the vector from one place to another in the places one (ignoring
> the one that was created when the place was created).
> --------------------
> Isn't there always going to be this kind of overhead involved when using
> places because of the rerequire of the original module in each place
> created?
>
> I checked the timing of creating test-vector and its relatively small but
> most of the overhead appears to be in communicating test-vector to the
> places
>
> Also when I change my code to include the extra place-channel-gets and
> place-channel-puts Tobias suggested  I get almost exactly the same  
> timings
> as without these extra place-channel-gets and puts
> I.e.
> test-vector-size 5000000
> With places cpu time: 13993 real time: 9381 gc time: 763
> Without places cpu time: 7472 real time: 7804 gc time: 4334
>
> But  a lot of the overhead appears to be in the communicating of
> test-vector to the places.
> When instead of: (place-channel-put ch (test-function (place-channel-get
> ch)))
> I put: (place-channel-put ch (test-function test-vector))
> then I get timings of:
> test-vector-size 5000000
> With places cpu time: 10218 real time: 5623 gc time: 0
> Without places cpu time: 7613 real time: 7820 gc time: 4492
>
>
> Is this the way "places" work.
> When a racket program executes a module that contains a place,  it:
> 1)executes the code in the module until it comes to the place form
> 2) It then creates a new racket instance (a place) containing a new  
> module.
> 3) That new module in the new racket instance requires the original  
> module
> containing the place.
> 4) The body of the place form is then executed in the new racket instance
> (the place)
> 5) Simultaneously the original module in the original racket instance
> continues executing.
> 6)The original and the new module (the two racket instances) communicate
> via place-channels
>
> So in effect in my code I've created 3 racket instances (3 places)
> executing on a 2 core machine.
> If I change my code so I'm executing 2 places instead of 3, I would have
> thought that would improve the timings, but that doesn't appear to be the
> case.
>
> The following code (2 places only, the original Racket instance and one
> place) takes more real-time than my original code (3 places).  (8607
> instead of 5623)
>
> I'm not clear why that should be the case?
>
> The timings are:
> test-vector-size 5000000
> With places cpu time: 8097 real time: 8607 gc time: 1856
> Without places cpu time: 7238 real time: 7476 gc time: 4024
>
> for the code:
> -------------------------------
> place-timing-test5-execute.rkt
> -------------------------------
> #lang racket
> (require "place-timing-test5.rkt")
> (printf "test-vector-size ~a" test-vector-size)
> (newline)
> (display "With places ")
> (time (places-main))
> (display "Without places ")
> (time (noplaces-main))
>
> (system "PAUSE")
>
> ------------------------------------
>  place-timing-test5-execute.rkt
> -------------------------------------
> #lang racket
>
> (provide places-main noplaces-main test-vector-size)
> (define test-vector (build-vector 5000000 +))
> (define test-vector-size (vector-length test-vector))
> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
> vectr))))
>
>
> (define (places-main)
>     (test-function test-vector)
>     (define place1
>      (place ch
>       (begin
>         (place-channel-put ch #t)
>          (place-channel-put ch (test-function test-vector)))))
>
>     (place-channel-get place1)
>     (place-channel-get place1)
>     (place-wait place1)
>     (void))
>
> (define (noplaces-main)
>     (test-function test-vector)
>     (test-function test-vector)
>     (void))
> ------------------
>
> VERSUS
>
> test-vector-size 5000000
> With places cpu time: 10218 real time: 5623 gc time: 0
> Without places cpu time: 7613 real time: 7820 gc time: 4492
>
> -------------------------------------
> place-timing-test-execute.rkt
> -------------------------------------
> #lang racket
> (require "place-timing-test.rkt")
> (printf "test-vector-size ~a" test-vector-size)
> (newline)
> (display "With places ")
> (time (places-main))
> (display "Without places ")
> (time (noplaces-main))
>
> (system "PAUSE")
>
>
> ------------------------
> place-timing-test.rkt
> ------------------------
> #lang racket
>
> (provide places-main noplaces-main test-vector-size)
> (define test-vector (build-vector 5000000 +))
> (define test-vector-size (vector-length test-vector))
> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
> vectr))))
>
>
> (define (places-main)
>     (define place1
>     (place ch
>       (begin
>         (place-channel-put ch #t)
>         (place-channel-put ch (test-function test-vector)))))
>
>     (define place2
>       (place ch
>        (begin
>          (place-channel-put ch #t)
>          (place-channel-put ch (test-function test-vector)))))
>
>     (place-channel-get place1)
>     (place-channel-get place2)
>
>
>     (place-channel-get place1)
>     (place-channel-get place2)
>
>     (place-wait place1)
>     (place-wait place2)
>     (void))
>
> (define (noplaces-main)
>     (test-function test-vector)
>     (test-function test-vector)
>     (void))
> --------------------
>
>
>
> On Thu, Mar 14, 2013 at 8:28 AM, Tobias Hammer <tobias.hammer at dlr.de>  
> wrote:
>
>> As Robby said, the time should only include both place-channel-get and
>> -put calls to make it comparable.
>> But this is not enough, because the (place ...)-creation seems to be
>> non-blocking, i.e they might not be started up yet when the codes  
>> reached
>> the first -put.
>>
>> This can be solved by explicitly synchronizing the places with the main
>> program:
>>
>>
>> (define (places-main)
>>     (define place1
>>        (place ch
>>               (begin
>>                 (place-channel-put ch #t)
>>                 (place-channel-put ch (test-function (place-channel-get
>> ch))))))
>>
>>     (define place2
>>         (place ch
>>                (begin
>>                  (place-channel-put ch #t)
>>                  (place-channel-put ch (test-function (place-channel-get
>> ch))))))
>>
>>     (place-channel-get place1)
>>     (place-channel-get place2)
>>
>>
>>     (display "With places ")
>>     (time
>>      (place-channel-put place1 test-vector)
>>      (place-channel-put place2 test-vector)
>>      (place-channel-get place1)
>>      (place-channel-get place2))
>>
>>     (place-wait place1)
>>     (place-wait place2)
>>     (void))
>>
>>
>> With this i get the following times
>>
>> With places cpu time: 5600 real time: 3199 gc time: 404
>> Without places cpu time: 3700 real time: 3678 gc time: 2208
>>
>> Now its at least faster than the sequential version. But the overhead
>> seems to be still a lot more than i had expected.
>>
>> Tobias
>>
>>
>>
>>
>> On Thu, 14 Mar 2013 02:23:22 +0100, Harry Spier  
>> <vasishtha.spier at gmail.com>
>> wrote:
>>
>>  Dear members,
>>>
>>> I've run the following racket program  (as an executable) to test the
>>> performance of places on a windows dual core pentium machine under  
>>> Vista.
>>>  I've run this with various sizes for test-vector.  Even when the  
>>> amount
>>> of
>>> computation is large (test-vector-size = 5000000) the performance with  
>>> the
>>> computation split over two places takes more than double the time to
>>> complete as when no places are used.
>>>
>>> I'm not clear why on a dual core machine the performance wasn't better
>>> with
>>>  the computation split over two places than with no places. In fact the
>>> results are the opposite with the performance for no places always  
>>> double
>>> that for two places.
>>>
>>> --------------------------------------
>>> place-timing-test-executable.rkt
>>> ---------------------------------------
>>> #lang racket
>>> (require "place-timing-test.rkt")
>>> (printf "test-vector-size ~a" test-vector-size)
>>> (newline)
>>> (display "With places ")
>>> (time (places-main))
>>> (display "Without places ")
>>> (time (noplaces-main))
>>>
>>> (system "PAUSE")
>>>
>>>
>>> -----------------------
>>> place-timing-test.rkt
>>> -----------------------
>>> #lang racket
>>> (provide places-main noplaces-main test-vector-size)
>>> (define test-vector (build-vector 5000000 +))
>>> (define test-vector-size (vector-length test-vector))
>>> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
>>> vectr))))
>>>
>>> (define (places-main)
>>>     (define place1
>>>        (place ch (place-channel-put ch (test-function  
>>> (place-channel-get
>>> ch)))))
>>>
>>>     (define place2
>>>         (place ch (place-channel-put ch (test-function  
>>> (place-channel-get
>>> ch)))))
>>>
>>>     (place-channel-put place1 test-vector)
>>>     (place-channel-put place2 test-vector)
>>>     (place-channel-get place1)
>>>     (place-channel-get place2)
>>>     (place-wait place1)
>>>     (place-wait place2)
>>>     (void))
>>>
>>> (define (noplaces-main)
>>>     (test-function test-vector)
>>>     (test-function test-vector)
>>>     (void)
>>> -------------------------------
>>>
>>> These are the results for different sizes of test-vector.  The amount  
>>> of
>>> computation is linear to the size of test-vector.
>>>
>>> test-vector-size 100000
>>> With places cpu time: 1685 real time: 856 gc time: 0
>>> Without places cpu time: 187 real time: 170 gc time: 94
>>> Press any key to continue . . .
>>> -----
>>> test-vector-size 1000000
>>> With places cpu time: 3822 real time: 2637 gc time: 265
>>> Without places cpu time: 1201 real time: 1191 gc time: 452
>>> Press any key to continue . . .
>>> -------
>>> test-vector-size 5000000
>>> With places cpu time: 15787 real time: 23318 gc time: 1373
>>> Without places cpu time: 7769 real time: 9456 gc time: 4461
>>> Press any key to continue . . .
>>> ------
>>>
>>> Thanks,
>>> Harry Spier
>>>
>>
>>
>> --
>> ---------------------------------------------------------
>> Tobias Hammer
>> DLR / Institute of Robotics and Mechatronics
>> Muenchner Str. 20, D-82234 Wessling
>> Tel.: 08153/28-1487
>> Mail: tobias.hammer at dlr.de

Posted on the users mailing list.