<< Chapter < Page | Chapter >> Page > |
Some theoretical work has taken a data-parallel-like approach to designing algorithms, allowing the number of processors to increase with the size of the data. For example, consider summing the elements of an array. Sequentially, we would write this as
x = 0
for i = 1 to nx = x + a[i]
end
Sequentially, this would run in
time. To run this in parallel, consider
n
processes, numbered 1 to
n
, each containing original element
a[i]
. We might then perform the computation as a
data parallel tree sum , with each node at the same level of the tree operating in parallel.
step = 1
forall i = 1 to n doxtmp[i] = a[i]end
while step<n do
forall i = 1 to n-step by 2*step doxtmp[i] = xtmp[i]+ xtmp[i+step]end
step = step * 2end
x = xtmp[1]
This takes time, which is optimal in parallel. Many such algorithms and bounds are known, and classes (analogous to P and NP) can be constructed describing “solvable” parallel problems.
Other algorithmic work has concentrated on mapping algorithms to more limited set of parallel processors. For example, the above algorithm might be
mapped onto
p
processors in the following way.
forall j = 1 to p do
lo[j]= 1+(j-1)*n/p
hi[j]= j*n/p
xtmp[j]= 0
for I = lo[j]to hi[j] doxtmp[j] = xtmp[j]+ a[i]end
doif j=1
then step = 1barrier_wait()
while step[j]<n do
if j+step[j]<p and j mod (step[j]*2) = 1then xtmp[j] = xtmp[j]+ xtmp[j+step[j]]end
step[j]= step[j] * 2barrier_wait()
endend
x = xtmp[1]
Note that the barrier synchronizations are necessary to ensure that no processor
j
runs ahead of the others, thus causing some updates of
xtmp[j]
to use the wrong data values. The time for this algorithm is
. This is optimal for operation count, but not necessarily for number of synchronizations.
Many other aspects of parallel algorithms deserve study. For example, the design of algorithms for other important problems is a vast subject, as is describing general families of parallel algorithms. Analyzing algorithms with respect to time, memory, parallel overhead, locality, and many other measures is important in practice. Practical implementation of algorithms, and measuring those implementations, is particularly useful in the real world. Describing the structure of a full parallel application provides an excellent guidepost for future developers. The Open Education Cup invites entries on all these aspects of parallel algorithms and applications, as well as other relevant topics.
Creating a parallel application is complex, but developing a correct and efficient parallel program is even harder. We mention just a few of the difficulties in the following list.
Notification Switch
Would you like to follow the '2008-'09 open education cup: high performance computing' conversation and receive update notifications?