[racket] place: terrible performance of place-channel-get?
Hi,
I am looking at parallelising some numerical computation with Racket. I’ve tried future/touch first. However, the data for computation is passed as vectors and in my experiments with future/touch it would always find "synchronisation task” upon which all multicore-threads collapse into one core serialised computation.
Now, I decided to try place. My idea is to make it similar to Common Lisp’s LPARALLEL: create workers <= number of cores and distribute tasks into those workers. The problem I have encountered, however, is that place-channel-get seems to take forever to compute. Here is an example of some simulated computation on a vector using two places and trying to run them in parallel:
#lang racket
(require racket/place)
(provide test-places1 test-places2 long-computation v1 v2 random-vector)
;;; Utilities:
(define (random-list n)
(let loop ((i n) (r '()))
(if (zero? i)
r
(loop (sub1 i) (cons (random) r)))))
(define (random-vector n)
(let ((l (random-list n)))
(list->vector l)))
(define (vector-reduce f init v)
(let ((n (vector-length v)))
(let loop ((i 0) (r init))
(if (= i n)
r
(loop (add1 i) (f r (vector-ref v i)))))))
;;; This is computation to be run in each place:
(define (long-computation v)
(let ((n (vector-length v))
(v1 (vector-copy v))) ; v is immutable, if want to mutate, must copy it
(let loop ((i 0))
(if (= i n)
(begin
(sleep 2) ; make it work for a bit longer
(vector-reduce + 0.0 v1)) ; to make result printable
(begin
(vector-set! v1 i (* (exp (- (vector-ref v1 i)))
(sin (* pi (vector-ref v1 i))))) ;flonum computation
(loop (add1 i)))))))
;;; two vectors to be sent to long-computation
(define v1 (random-vector 100000))
(define v2 (random-vector 100000))
;;; Test using one place:
(define (test-places1)
(define p1
(place ch1
(define v (place-channel-get ch1))
(define w (long-computation v))
(place-channel-put ch1 w)))
(place-channel-put p1 v1)
(time (place-channel-get p1)))
;;; Test using 2 places:
(define (test-places2)
(define p1
(place ch1
(define v (place-channel-get ch1))
(define w (long-computation v))
(place-channel-put ch1 w)))
(define p2
(place ch2
(define v (place-channel-get ch2))
(define w (long-computation v))
(place-channel-put ch2 w)))
(place-channel-put p1 v1)
(place-channel-put p2 v2)
(sleep 2) ; hypothetically, after this results shoud be ready immidiately!
(time (list (place-channel-get p1) (place-channel-get p2))))
Exectution from racket on MacBook Pro with Intel Core 2 Duo:
-> (time (long-computation v1))
cpu time: 42 real time: 2043 gc time: 0
39523.12275516648
-> (test-places1)
cpu time: 7593 real time: 7475 gc time: 7001
39523.12275516648
-> (test-places2)
cpu time: 16591 real time: 12492 gc time: 15485
'(39523.12275516648 39505.415738171105)
So, the time of execution of (long-computation v1) and the time of getting the result out of the channel in (test-places1) should be more or less the same, but it is not. Furthermore, (test-places2) takes almost twice as (test-places1) (note, I put (time …) around just getting the value, so it does not include the time of creating the place).
Am I doing something wrong?
Cheers, Alexey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.racket-lang.org/users/archive/attachments/20141110/8ee93f48/attachment-0001.html>