[racket] Racket and concurrency
>
> 2) Is it better to do the concurrency at the highest level possible or
> at the lowest level possible. I.e. should I be processing pages
> concurrently or should I go to a much lower level and only be
> processing letters concurrently. Does it matter?
>
> 3) How does hyperthreading affect the number of places or futures I
> can run concurrently? For example if I have an i7 with 4 cores and
> hyperthreading, will that run 4 or 8 places concurrently?
>
actually, a lot of it depends on two things: a) whether your computations
are I/O or CPU bound (in the case of OCR, I assume, they are CPU bound) and
b) whether you benefit at all from any "short" computation being completed
prior to when it would have been its turn in "linear" computation.
If all your threads are CPU bound and your task isn't completed before the
last bit of it is completed - all the concurrency you can get ouf of it is
the parallelism within several processors (and of course both the operating
system and language support of using those processors efficiently). In the
"worst case," you only have one processor and there isn't a point in doing
anything before your entire document is processed - if that's the case,
parallelizing your computations actually is your *worst* option because the
total number of CPU cycles needed to do the whole jobs doesn't change
regardless of how you split up the compuations, but in the parallel version,
there is context switching overhead! In that scenario, you might as well
(even better) launch one batch file to do all pieces in turn, go to sleep
and come back next morning to see the task done.
If, however, there are a lot of "gaps" a particular task has - meaning that
an I/O system or a coprocessor relieve(s) the CPU(s) periodically -
concurrent programming can help you utilize the CPU(s) a 100% while filling
in the gaps with other computations, thus "squeezing together" the
computations.
Yet another story is when you can benefit from individual computations being
computed as fast as possible. Assume, for example, that some GUI frontend
can offer to display an individual page as soon as it (the page) was
processed completly for the user to scrutinize - in that case, needless to
say, you don't want a 100M document to clog up the CPU while a dozen very
small documents would be ready in no time flat so the person in front of the
GUI could already start working on the small documents. In that case,
parallelizing would make sense even if every single task would take up 100%
CPU cycles (thus paying context swith overhead) because the small
computations are perceived to be completed faster. So the choice isn't black
and white but instead depends very strongly on the task set.
I hope I didn't state the obvious; all of the above (aside of course from
not being specific to racket at all) describes fairly elementary concurrent
application design. Apologies if I should have assessed you on too
elementary a level.
> 4) Are there any "gotcha's" I need to look out for?
>
I don't know if anyone has computed the costs of concurrency in Racket yet -
I assume that whereever possible (eg on Windows where fibers are available)
the concurrency layers have been tailored to suit whatever the underlying
hard- and software offers, but be aware that concurrency never comes free.
So do run some elementary tests on the cost of concurrency - eg, (time) both
single threaded and multithreaded like computations and see what the
overhead is before you dive into concurrent programming. Also (and this,
too, is rather elementary, and again, please don't take it as an insult if I
state what may sound trivial) keep in mind that concurrent software desing
is one of the outer space limits of computation still - one of the things
that won't work if you don't do it good enough but likewise won't work if
you do it too well. Synchronization problems are among the worst to debug
and trace because they are rarely reproducable and tend to manifest
themselves differently each time they occur.
> Thanks,
> Harry
> ____________________
> Racket Users list:
> http://lists.racket-lang.org/users
>