[racket] Places performance
Thanks Robby and Tobias.
Robby said:
--------------------
You're doing way more work in the timed portion of the place version than
you are in the non-place one. In the places one you're creating the
test-vector 3 times, once in the original place (that you don't explicitly
create) and once in each of the places that you create. Your code is also
copying the vector from one place to another in the places one (ignoring
the one that was created when the place was created).
--------------------
Isn't there always going to be this kind of overhead involved when using
places because of the rerequire of the original module in each place
created?
I checked the timing of creating test-vector and its relatively small but
most of the overhead appears to be in communicating test-vector to the
places
Also when I change my code to include the extra place-channel-gets and
place-channel-puts Tobias suggested I get almost exactly the same timings
as without these extra place-channel-gets and puts
I.e.
test-vector-size 5000000
With places cpu time: 13993 real time: 9381 gc time: 763
Without places cpu time: 7472 real time: 7804 gc time: 4334
But a lot of the overhead appears to be in the communicating of
test-vector to the places.
When instead of: (place-channel-put ch (test-function (place-channel-get
ch)))
I put: (place-channel-put ch (test-function test-vector))
then I get timings of:
test-vector-size 5000000
With places cpu time: 10218 real time: 5623 gc time: 0
Without places cpu time: 7613 real time: 7820 gc time: 4492
Is this the way "places" work.
When a racket program executes a module that contains a place, it:
1)executes the code in the module until it comes to the place form
2) It then creates a new racket instance (a place) containing a new module.
3) That new module in the new racket instance requires the original module
containing the place.
4) The body of the place form is then executed in the new racket instance
(the place)
5) Simultaneously the original module in the original racket instance
continues executing.
6)The original and the new module (the two racket instances) communicate
via place-channels
So in effect in my code I've created 3 racket instances (3 places)
executing on a 2 core machine.
If I change my code so I'm executing 2 places instead of 3, I would have
thought that would improve the timings, but that doesn't appear to be the
case.
The following code (2 places only, the original Racket instance and one
place) takes more real-time than my original code (3 places). (8607
instead of 5623)
I'm not clear why that should be the case?
The timings are:
test-vector-size 5000000
With places cpu time: 8097 real time: 8607 gc time: 1856
Without places cpu time: 7238 real time: 7476 gc time: 4024
for the code:
-------------------------------
place-timing-test5-execute.rkt
-------------------------------
#lang racket
(require "place-timing-test5.rkt")
(printf "test-vector-size ~a" test-vector-size)
(newline)
(display "With places ")
(time (places-main))
(display "Without places ")
(time (noplaces-main))
(system "PAUSE")
------------------------------------
place-timing-test5-execute.rkt
-------------------------------------
#lang racket
(provide places-main noplaces-main test-vector-size)
(define test-vector (build-vector 5000000 +))
(define test-vector-size (vector-length test-vector))
(define (test-function vectr) (apply + (vector->list (vector-map sqrt
vectr))))
(define (places-main)
(test-function test-vector)
(define place1
(place ch
(begin
(place-channel-put ch #t)
(place-channel-put ch (test-function test-vector)))))
(place-channel-get place1)
(place-channel-get place1)
(place-wait place1)
(void))
(define (noplaces-main)
(test-function test-vector)
(test-function test-vector)
(void))
------------------
VERSUS
test-vector-size 5000000
With places cpu time: 10218 real time: 5623 gc time: 0
Without places cpu time: 7613 real time: 7820 gc time: 4492
-------------------------------------
place-timing-test-execute.rkt
-------------------------------------
#lang racket
(require "place-timing-test.rkt")
(printf "test-vector-size ~a" test-vector-size)
(newline)
(display "With places ")
(time (places-main))
(display "Without places ")
(time (noplaces-main))
(system "PAUSE")
------------------------
place-timing-test.rkt
------------------------
#lang racket
(provide places-main noplaces-main test-vector-size)
(define test-vector (build-vector 5000000 +))
(define test-vector-size (vector-length test-vector))
(define (test-function vectr) (apply + (vector->list (vector-map sqrt
vectr))))
(define (places-main)
(define place1
(place ch
(begin
(place-channel-put ch #t)
(place-channel-put ch (test-function test-vector)))))
(define place2
(place ch
(begin
(place-channel-put ch #t)
(place-channel-put ch (test-function test-vector)))))
(place-channel-get place1)
(place-channel-get place2)
(place-channel-get place1)
(place-channel-get place2)
(place-wait place1)
(place-wait place2)
(void))
(define (noplaces-main)
(test-function test-vector)
(test-function test-vector)
(void))
--------------------
On Thu, Mar 14, 2013 at 8:28 AM, Tobias Hammer <tobias.hammer at dlr.de> wrote:
> As Robby said, the time should only include both place-channel-get and
> -put calls to make it comparable.
> But this is not enough, because the (place ...)-creation seems to be
> non-blocking, i.e they might not be started up yet when the codes reached
> the first -put.
>
> This can be solved by explicitly synchronizing the places with the main
> program:
>
>
> (define (places-main)
> (define place1
> (place ch
> (begin
> (place-channel-put ch #t)
> (place-channel-put ch (test-function (place-channel-get
> ch))))))
>
> (define place2
> (place ch
> (begin
> (place-channel-put ch #t)
> (place-channel-put ch (test-function (place-channel-get
> ch))))))
>
> (place-channel-get place1)
> (place-channel-get place2)
>
>
> (display "With places ")
> (time
> (place-channel-put place1 test-vector)
> (place-channel-put place2 test-vector)
> (place-channel-get place1)
> (place-channel-get place2))
>
> (place-wait place1)
> (place-wait place2)
> (void))
>
>
> With this i get the following times
>
> With places cpu time: 5600 real time: 3199 gc time: 404
> Without places cpu time: 3700 real time: 3678 gc time: 2208
>
> Now its at least faster than the sequential version. But the overhead
> seems to be still a lot more than i had expected.
>
> Tobias
>
>
>
>
> On Thu, 14 Mar 2013 02:23:22 +0100, Harry Spier <vasishtha.spier at gmail.com>
> wrote:
>
> Dear members,
>>
>> I've run the following racket program (as an executable) to test the
>> performance of places on a windows dual core pentium machine under Vista.
>> I've run this with various sizes for test-vector. Even when the amount
>> of
>> computation is large (test-vector-size = 5000000) the performance with the
>> computation split over two places takes more than double the time to
>> complete as when no places are used.
>>
>> I'm not clear why on a dual core machine the performance wasn't better
>> with
>> the computation split over two places than with no places. In fact the
>> results are the opposite with the performance for no places always double
>> that for two places.
>>
>> --------------------------------------
>> place-timing-test-executable.rkt
>> ---------------------------------------
>> #lang racket
>> (require "place-timing-test.rkt")
>> (printf "test-vector-size ~a" test-vector-size)
>> (newline)
>> (display "With places ")
>> (time (places-main))
>> (display "Without places ")
>> (time (noplaces-main))
>>
>> (system "PAUSE")
>>
>>
>> -----------------------
>> place-timing-test.rkt
>> -----------------------
>> #lang racket
>> (provide places-main noplaces-main test-vector-size)
>> (define test-vector (build-vector 5000000 +))
>> (define test-vector-size (vector-length test-vector))
>> (define (test-function vectr) (apply + (vector->list (vector-map sqrt
>> vectr))))
>>
>> (define (places-main)
>> (define place1
>> (place ch (place-channel-put ch (test-function (place-channel-get
>> ch)))))
>>
>> (define place2
>> (place ch (place-channel-put ch (test-function (place-channel-get
>> ch)))))
>>
>> (place-channel-put place1 test-vector)
>> (place-channel-put place2 test-vector)
>> (place-channel-get place1)
>> (place-channel-get place2)
>> (place-wait place1)
>> (place-wait place2)
>> (void))
>>
>> (define (noplaces-main)
>> (test-function test-vector)
>> (test-function test-vector)
>> (void)
>> -------------------------------
>>
>> These are the results for different sizes of test-vector. The amount of
>> computation is linear to the size of test-vector.
>>
>> test-vector-size 100000
>> With places cpu time: 1685 real time: 856 gc time: 0
>> Without places cpu time: 187 real time: 170 gc time: 94
>> Press any key to continue . . .
>> -----
>> test-vector-size 1000000
>> With places cpu time: 3822 real time: 2637 gc time: 265
>> Without places cpu time: 1201 real time: 1191 gc time: 452
>> Press any key to continue . . .
>> -------
>> test-vector-size 5000000
>> With places cpu time: 15787 real time: 23318 gc time: 1373
>> Without places cpu time: 7769 real time: 9456 gc time: 4461
>> Press any key to continue . . .
>> ------
>>
>> Thanks,
>> Harry Spier
>>
>
>
> --
> ---------------------------------------------------------
> Tobias Hammer
> DLR / Institute of Robotics and Mechatronics
> Muenchner Str. 20, D-82234 Wessling
> Tel.: 08153/28-1487
> Mail: tobias.hammer at dlr.de
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20130314/c2fa64d3/attachment-0001.html>