[plt-dev] Parallel Futures Release

From: Sam TH (samth at ccs.neu.edu)
Date: Sun Dec 13 09:18:11 EST 2009

On Sun, Dec 13, 2009 at 9:00 AM, Robby Findler
<robby at eecs.northwestern.edu> wrote:
> On Sun, Dec 13, 2009 at 7:53 AM, Sam TH <samth at ccs.neu.edu> wrote:
>> On Mon, Dec 7, 2009 at 12:31 PM, James Swaine <james.swaine at gmail.com> wrote:
>>> I'm pleased to announce the initial release of parallel futures, a
>>> construct for fine-grained parallelism in PLT. Roughly speaking, a
>>> programmer passes a thunk to 'future' and it gets run in parallel.
>>> That "roughly" holds a few gotchas, partly because we're just getting
>>> started and partly due to the technique we're using. See the
>>> documentation for more details:
>>
>> Another belated question about this:
>>
>> Say I have two big non-primitive-calling computations, which don't
>> affect each other at all.  I gather that this is the intended use case
>> for futures.  Each of them produces a number, and the result of my
>> program should be the sum of the two numbers.  How would I write this?
>>
>> I originally though I'd do this:
>>
>> (let ([f1 (future big-comp1)] [f2 (future big-comp2)])
>>  (+ (touch f1) (touch f2)))
>
> This is what you want. You will have two parallel computations and
> you'll use two cpus on your machine.
>
>> But this implies lots of potentially-unnecessary communication - I
>> start each computation on a separate OS thread, and then I join them
>> before they're done.  I'd rather be able to wait until I know that
>> `f1' and `f2' are done before using `touch'.  Is there a way to do
>> that?
>
> I don't see what you're getting at here. The only reason to avoid
> touching is that you have something else to do, but you don't seem to
> here. (And if you did, you could put that thing in another future and
> then touch all 3.)

Let's imagine that the first computation takes less time as a the
second.  Further, imagine that the second computation has a big
working set during its run.  Then, when I go to `touch' `f2', all of
that working set will have to be moved over to the core/processor
running the main thread.  Even on the laptop I'm using right now,
that's going to slow things down, because the caches will be wrong,
the branch predictor will be wrong, etc.  On a bigger machine, where
some memory locations are closer than others even apart from the
cache, it will be much worse.

It seems like it would be pretty easy to have a primitive `wait' that
took any number of futures, and blocked until they all finished or
suspended due to non-threadsafe primitives.  Then I could just add
(wait f1 f2) before the call to `+' and avoid this problem.
-- 
sam th
samth at ccs.neu.edu


Posted on the dev mailing list.