[plt-dev] Parallel Futures Release

From: Robby Findler (robby at eecs.northwestern.edu)
Date: Sun Dec 13 09:31:24 EST 2009

On Sun, Dec 13, 2009 at 8:18 AM, Sam TH <samth at ccs.neu.edu> wrote:
> On Sun, Dec 13, 2009 at 9:00 AM, Robby Findler
> <robby at eecs.northwestern.edu> wrote:
>> On Sun, Dec 13, 2009 at 7:53 AM, Sam TH <samth at ccs.neu.edu> wrote:
>>> On Mon, Dec 7, 2009 at 12:31 PM, James Swaine <james.swaine at gmail.com> wrote:
>>>> I'm pleased to announce the initial release of parallel futures, a
>>>> construct for fine-grained parallelism in PLT. Roughly speaking, a
>>>> programmer passes a thunk to 'future' and it gets run in parallel.
>>>> That "roughly" holds a few gotchas, partly because we're just getting
>>>> started and partly due to the technique we're using. See the
>>>> documentation for more details:
>>>
>>> Another belated question about this:
>>>
>>> Say I have two big non-primitive-calling computations, which don't
>>> affect each other at all.  I gather that this is the intended use case
>>> for futures.  Each of them produces a number, and the result of my
>>> program should be the sum of the two numbers.  How would I write this?
>>>
>>> I originally though I'd do this:
>>>
>>> (let ([f1 (future big-comp1)] [f2 (future big-comp2)])
>>>  (+ (touch f1) (touch f2)))
>>
>> This is what you want. You will have two parallel computations and
>> you'll use two cpus on your machine.
>>
>>> But this implies lots of potentially-unnecessary communication - I
>>> start each computation on a separate OS thread, and then I join them
>>> before they're done.  I'd rather be able to wait until I know that
>>> `f1' and `f2' are done before using `touch'.  Is there a way to do
>>> that?
>>
>> I don't see what you're getting at here. The only reason to avoid
>> touching is that you have something else to do, but you don't seem to
>> here. (And if you did, you could put that thing in another future and
>> then touch all 3.)
>
> Let's imagine that the first computation takes less time as a the
> second.  Further, imagine that the second computation has a big
> working set during its run.

> Then, when I go to `touch' `f2', all of
> that working set will have to be moved over to the core/processor
> running the main thread.  Even on the laptop I'm using right now,
> that's going to slow things down, because the caches will be wrong,
> the branch predictor will be wrong, etc.  On a bigger machine, where
> some memory locations are closer than others even apart from the
> cache, it will be much worse.

No, I don't think that happens. (At the moment, those decisions are
made by the pthread library, tho.)

> It seems like it would be pretty easy to have a primitive `wait' that
> took any number of futures, and blocked until they all finished or
> suspended due to non-threadsafe primitives.  Then I could just add
> (wait f1 f2) before the call to `+' and avoid this problem.

This is effectively the same as a bunch of touchs, I believe. Perhaps
others can correct me, but my understanding is that the (wait f ...)
you propose is just the same as (begin (touch f) ...).

Robby


Posted on the dev mailing list.