[racket] place: terrible performance of place-channel-get?
Hi Matthew,
Thanks a lot for the reply!
This works! Thanks a lot.
Regards,
Alexey
On 12 Nov 2014, at 18:52, Matthew Flatt <mflatt at cs.utah.edu> wrote:
> I'll push a repair to the development version.
>
>
> The problem isn't so much that message copying/transfer is slow, but
> that the rule to trigger an all-places GC doesn't accommodate a large,
> not-yet-delivered message. I'll repair that rule.
>
> Most of the process time in your example shows up as GC time, because
> the GC was continuously firing while the message waited for the new
> place to start and receive it (and the constant GCs slowed the place
> start-up).
>
>
> If upgrading is not an option, you can work around the problem by
> waiting for a "ready" message from the new place before sending the
> vector as a message. For example, change `test-place1` to
>
> (define (test-places1)
> (define p1
> (place ch1
> (place-channel-put ch1 'ready)
> (define v (place-channel-get ch1))
> (define w (long-computation v))
> (place-channel-put ch1 w)))
> (place-channel-get p1) ; => 'ready
> (place-channel-put p1 v1)
> (time (place-channel-get p1)))
>
> That way, `v1` doesn't sit in the message channel long enough to cause
> a problem.
>
> At Tue, 11 Nov 2014 17:41:11 -0700, Matthew Flatt wrote:
>> This does seem extremely slow. A place-message send must copy the
>> vector to send it as a message, but the copy shouldn't take so long.
>> I'll investigate further.
>>
>> Meanwhile, an option in this case might be to created a "shared
>> flvector", which can be passed directly (i.e., without copying) to
>> another place. I've enclosed a variant of your example to illustrate.
>>
>> At Mon, 10 Nov 2014 11:58:21 +0200, Alexey Cherkaev wrote:
>>> Hi,
>>>
>>> I am looking at parallelising some numerical computation with Racket. I’ve
>>> tried future/touch first. However, the data for computation is passed as
>>> vectors and in my experiments with future/touch it would always find
>>> "synchronisation task” upon which all multicore-threads collapse into one
>> core
>>> serialised computation.
>>>
>>> Now, I decided to try place. My idea is to make it similar to Common Lisp’s
>>> LPARALLEL: create workers <= number of cores and distribute tasks into those
>>> workers. The problem I have encountered, however, is that place-channel-get
>>> seems to take forever to compute. Here is an example of some simulated
>>> computation on a vector using two places and trying to run them in parallel:
>>>
>>> #lang racket
>>>
>>> (require racket/place)
>>>
>>> (provide test-places1 test-places2 long-computation v1 v2 random-vector)
>>>
>>> ;;; Utilities:
>>> (define (random-list n)
>>> (let loop ((i n) (r '()))
>>> (if (zero? i)
>>> r
>>> (loop (sub1 i) (cons (random) r)))))
>>>
>>> (define (random-vector n)
>>> (let ((l (random-list n)))
>>> (list->vector l)))
>>>
>>> (define (vector-reduce f init v)
>>> (let ((n (vector-length v)))
>>> (let loop ((i 0) (r init))
>>> (if (= i n)
>>> r
>>> (loop (add1 i) (f r (vector-ref v i)))))))
>>>
>>> ;;; This is computation to be run in each place:
>>> (define (long-computation v)
>>> (let ((n (vector-length v))
>>> (v1 (vector-copy v))) ; v is immutable, if want to mutate, must copy
>> it
>>> (let loop ((i 0))
>>> (if (= i n)
>>> (begin
>>> (sleep 2) ; make it work for a bit longer
>>> (vector-reduce + 0.0 v1)) ; to make result printable
>>> (begin
>>> (vector-set! v1 i (* (exp (- (vector-ref v1 i)))
>>> (sin (* pi (vector-ref v1 i))))) ;flonum
>>> computation
>>> (loop (add1 i)))))))
>>>
>>> ;;; two vectors to be sent to long-computation
>>> (define v1 (random-vector 100000))
>>> (define v2 (random-vector 100000))
>>>
>>> ;;; Test using one place:
>>> (define (test-places1)
>>> (define p1
>>> (place ch1
>>> (define v (place-channel-get ch1))
>>> (define w (long-computation v))
>>> (place-channel-put ch1 w)))
>>> (place-channel-put p1 v1)
>>> (time (place-channel-get p1)))
>>>
>>> ;;; Test using 2 places:
>>> (define (test-places2)
>>> (define p1
>>> (place ch1
>>> (define v (place-channel-get ch1))
>>> (define w (long-computation v))
>>> (place-channel-put ch1 w)))
>>> (define p2
>>> (place ch2
>>> (define v (place-channel-get ch2))
>>> (define w (long-computation v))
>>> (place-channel-put ch2 w)))
>>> (place-channel-put p1 v1)
>>> (place-channel-put p2 v2)
>>> (sleep 2) ; hypothetically, after this results shoud be ready immidiately!
>>> (time (list (place-channel-get p1) (place-channel-get p2))))
>>>
>>> Exectution from racket on MacBook Pro with Intel Core 2 Duo:
>>>
>>> -> (time (long-computation v1))
>>> cpu time: 42 real time: 2043 gc time: 0
>>> 39523.12275516648
>>> -> (test-places1)
>>> cpu time: 7593 real time: 7475 gc time: 7001
>>> 39523.12275516648
>>> -> (test-places2)
>>> cpu time: 16591 real time: 12492 gc time: 15485
>>> '(39523.12275516648 39505.415738171105)
>>>
>>> So, the time of execution of (long-computation v1) and the time of getting
>> the
>>> result out of the channel in (test-places1) should be more or less the same,
>>> but it is not. Furthermore, (test-places2) takes almost twice as
>> (test-places1)
>>> (note, I put (time …) around just getting the value, so it does not include
>> the
>>> time of creating the place).
>>>
>>> Am I doing something wrong?
>>>
>>> Cheers, Alexey
>>>
>>>
>>> ____________________
>>> Racket Users list:
>>> http://lists.racket-lang.org/users
>> ------------------------------------------------------------------------------
>> [application/octet-stream "shared-flvector-example.rkt"] [~/Desktop & open]
>> [~/Temp & open]
>> ____________________
>> Racket Users list:
>> http://lists.racket-lang.org/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20141113/e2b19f35/attachment.html>