[racket] Places performance
On Thu, Mar 14, 2013 at 9:19 PM, Harry Spier <vasishtha.spier at gmail.com>wrote:
> Thanks Robby and Tobias.
>
> Robby said:
> --------------------
> You're doing way more work in the timed portion of the place version than
> you are in the non-place one. In the places one you're creating the
> test-vector 3 times, once in the original place (that you don't explicitly
> create) and once in each of the places that you create. Your code is also
> copying the vector from one place to another in the places one (ignoring
> the one that was created when the place was created).
> --------------------
> Isn't there always going to be this kind of overhead involved when using
> places because of the rerequire of the original module in each place
> created?
>
>
Well, the module system is pretty flexible, so you should be able to
arrange your modules not to do that.
> I checked the timing of creating test-vector and its relatively small but
> most of the overhead appears to be in communicating test-vector to the
> places
>
> Also when I change my code to include the extra place-channel-gets and
> place-channel-puts Tobias suggested I get almost exactly the same timings
> as without these extra place-channel-gets and puts
> I.e.
> test-vector-size 5000000
> With places cpu time: 13993 real time: 9381 gc time: 763
> Without places cpu time: 7472 real time: 7804 gc time: 4334
>
> But a lot of the overhead appears to be in the communicating of
> test-vector to the places.
> When instead of: (place-channel-put ch (test-function (place-channel-get
> ch)))
> I put: (place-channel-put ch (test-function test-vector))
> then I get timings of:
> test-vector-size 5000000
> With places cpu time: 10218 real time: 5623 gc time: 0
> Without places cpu time: 7613 real time: 7820 gc time: 4492
>
>
Yeah, I'm not sure about that. Probably the allocation of big, simple
vectors like that has been optimized, but passing them over place channels
hasn't. I'm sorry I can't help more here.
>
> Is this the way "places" work.
> When a racket program executes a module that contains a place, it:
> 1)executes the code in the module until it comes to the place form
> 2) It then creates a new racket instance (a place) containing a new module.
> 3) That new module in the new racket instance requires the original module
> containing the place.
> 4) The body of the place form is then executed in the new racket instance
> (the place)
> 5) Simultaneously the original module in the original racket instance
> continues executing.
> 6)The original and the new module (the two racket instances) communicate
> via place-channels
>
>
Something like that, but I prefer to think of it more like how it is
documented in the explanation of 'place'.
(You may find dynamic-place more useful for larger examples.)
> So in effect in my code I've created 3 racket instances (3 places)
> executing on a 2 core machine.
> If I change my code so I'm executing 2 places instead of 3, I would have
> thought that would improve the timings, but that doesn't appear to be the
> case.
>
>
Well, when a place isn't busy, then it won't take time. In general, it is
okay to have more places than those actually doing work (maybe not 1000s
more, there is a limit somewhere between 1 extra and 1000 extra :).
> The following code (2 places only, the original Racket instance and one
> place) takes more real-time than my original code (3 places). (8607
> instead of 5623)
>
> I'm not clear why that should be the case?
>
> The timings are:
> test-vector-size 5000000
> With places cpu time: 8097 real time: 8607 gc time: 1856
> Without places cpu time: 7238 real time: 7476 gc time: 4024
>
> for the code:
> -------------------------------
> place-timing-test5-execute.rkt
> -------------------------------
> #lang racket
> (require "place-timing-test5.rkt")
>
> (printf "test-vector-size ~a" test-vector-size)
> (newline)
> (display "With places ")
> (time (places-main))
> (display "Without places ")
> (time (noplaces-main))
>
> (system "PAUSE")
>
> ------------------------------------
> place-timing-test5-execute.rkt
> -------------------------------------
> #lang racket
>
> (provide places-main noplaces-main test-vector-size)
> (define test-vector (build-vector 5000000 +))
> (define test-vector-size (vector-length test-vector))
> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
> vectr))))
>
>
> (define (places-main)
> (test-function test-vector)
>
> (define place1
> (place ch
> (begin
> (place-channel-put ch #t)
> (place-channel-put ch (test-function test-vector)))))
>
> (place-channel-get place1)
> (place-channel-get place1)
> (place-wait place1)
>
> (void))
>
> (define (noplaces-main)
> (test-function test-vector)
> (test-function test-vector)
> (void))
> ------------------
>
> VERSUS
>
> test-vector-size 5000000
> With places cpu time: 10218 real time: 5623 gc time: 0
> Without places cpu time: 7613 real time: 7820 gc time: 4492
>
> -------------------------------------
> place-timing-test-execute.rkt
> -------------------------------------
> #lang racket
> (require "place-timing-test.rkt")
> (printf "test-vector-size ~a" test-vector-size)
> (newline)
> (display "With places ")
> (time (places-main))
> (display "Without places ")
> (time (noplaces-main))
>
> (system "PAUSE")
>
>
> ------------------------
> place-timing-test.rkt
> ------------------------
> #lang racket
>
> (provide places-main noplaces-main test-vector-size)
> (define test-vector (build-vector 5000000 +))
> (define test-vector-size (vector-length test-vector))
> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
> vectr))))
>
>
> (define (places-main)
> (define place1
> (place ch
> (begin
> (place-channel-put ch #t)
> (place-channel-put ch (test-function test-vector)))))
>
>
> (define place2
> (place ch
> (begin
> (place-channel-put ch #t)
> (place-channel-put ch (test-function test-vector)))))
>
> (place-channel-get place1)
> (place-channel-get place2)
>
>
>
> (place-channel-get place1)
> (place-channel-get place2)
>
> (place-wait place1)
> (place-wait place2)
> (void))
>
> (define (noplaces-main)
> (test-function test-vector)
> (test-function test-vector)
> (void))
> --------------------
>
>
>
> On Thu, Mar 14, 2013 at 8:28 AM, Tobias Hammer <tobias.hammer at dlr.de>wrote:
>
>> As Robby said, the time should only include both place-channel-get and
>> -put calls to make it comparable.
>> But this is not enough, because the (place ...)-creation seems to be
>> non-blocking, i.e they might not be started up yet when the codes reached
>> the first -put.
>>
>> This can be solved by explicitly synchronizing the places with the main
>> program:
>>
>>
>> (define (places-main)
>> (define place1
>> (place ch
>> (begin
>> (place-channel-put ch #t)
>> (place-channel-put ch (test-function (place-channel-get
>> ch))))))
>>
>> (define place2
>> (place ch
>> (begin
>> (place-channel-put ch #t)
>> (place-channel-put ch (test-function (place-channel-get
>> ch))))))
>>
>> (place-channel-get place1)
>> (place-channel-get place2)
>>
>>
>> (display "With places ")
>> (time
>> (place-channel-put place1 test-vector)
>> (place-channel-put place2 test-vector)
>> (place-channel-get place1)
>> (place-channel-get place2))
>>
>> (place-wait place1)
>> (place-wait place2)
>> (void))
>>
>>
>> With this i get the following times
>>
>> With places cpu time: 5600 real time: 3199 gc time: 404
>> Without places cpu time: 3700 real time: 3678 gc time: 2208
>>
>> Now its at least faster than the sequential version. But the overhead
>> seems to be still a lot more than i had expected.
>>
>> Tobias
>>
>>
>>
>>
>> On Thu, 14 Mar 2013 02:23:22 +0100, Harry Spier <
>> vasishtha.spier at gmail.com> wrote:
>>
>> Dear members,
>>>
>>> I've run the following racket program (as an executable) to test the
>>> performance of places on a windows dual core pentium machine under Vista.
>>> I've run this with various sizes for test-vector. Even when the amount
>>> of
>>> computation is large (test-vector-size = 5000000) the performance with
>>> the
>>> computation split over two places takes more than double the time to
>>> complete as when no places are used.
>>>
>>> I'm not clear why on a dual core machine the performance wasn't better
>>> with
>>> the computation split over two places than with no places. In fact the
>>> results are the opposite with the performance for no places always double
>>> that for two places.
>>>
>>> --------------------------------------
>>> place-timing-test-executable.rkt
>>> ---------------------------------------
>>> #lang racket
>>> (require "place-timing-test.rkt")
>>> (printf "test-vector-size ~a" test-vector-size)
>>> (newline)
>>> (display "With places ")
>>> (time (places-main))
>>> (display "Without places ")
>>> (time (noplaces-main))
>>>
>>> (system "PAUSE")
>>>
>>>
>>> -----------------------
>>> place-timing-test.rkt
>>> -----------------------
>>> #lang racket
>>> (provide places-main noplaces-main test-vector-size)
>>> (define test-vector (build-vector 5000000 +))
>>> (define test-vector-size (vector-length test-vector))
>>> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
>>> vectr))))
>>>
>>> (define (places-main)
>>> (define place1
>>> (place ch (place-channel-put ch (test-function (place-channel-get
>>> ch)))))
>>>
>>> (define place2
>>> (place ch (place-channel-put ch (test-function (place-channel-get
>>> ch)))))
>>>
>>> (place-channel-put place1 test-vector)
>>> (place-channel-put place2 test-vector)
>>> (place-channel-get place1)
>>> (place-channel-get place2)
>>> (place-wait place1)
>>> (place-wait place2)
>>> (void))
>>>
>>> (define (noplaces-main)
>>> (test-function test-vector)
>>> (test-function test-vector)
>>> (void)
>>> -------------------------------
>>>
>>> These are the results for different sizes of test-vector. The amount of
>>> computation is linear to the size of test-vector.
>>>
>>> test-vector-size 100000
>>> With places cpu time: 1685 real time: 856 gc time: 0
>>> Without places cpu time: 187 real time: 170 gc time: 94
>>> Press any key to continue . . .
>>> -----
>>> test-vector-size 1000000
>>> With places cpu time: 3822 real time: 2637 gc time: 265
>>> Without places cpu time: 1201 real time: 1191 gc time: 452
>>> Press any key to continue . . .
>>> -------
>>> test-vector-size 5000000
>>> With places cpu time: 15787 real time: 23318 gc time: 1373
>>> Without places cpu time: 7769 real time: 9456 gc time: 4461
>>> Press any key to continue . . .
>>> ------
>>>
>>> Thanks,
>>> Harry Spier
>>>
>>
>>
>> --
>> ---------------------------------------------------------
>> Tobias Hammer
>> DLR / Institute of Robotics and Mechatronics
>> Muenchner Str. 20, D-82234 Wessling
>> Tel.: 08153/28-1487
>> Mail: tobias.hammer at dlr.de
>>
>
>
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130314/951d09e6/attachment-0001.html>