<< Chapter < Page Chapter >> Page >

DO I=1,N IF (X .LT. A(I)) X = X + B(I)*2.ENDDO

You can’t know which way the branch will go for the next iteration until you are done with the current iteration. To recognize the dependency, try to unroll the loop slightly by hand. If you can’t start the second test until the first has finished, you have a dependent loop conditional. You may want to look at these types of loops to see if you can eliminate the iteration-to-iteration value.

Reductions

Keep an eye out for loops in which the if-statement is performing a max or min function on a array. This is a reduction , so called because it reduces a array to a scalar result (the previous example was a reduction too, by the way). Again, we are getting a little bit ahead of ourselves, but since we are talking about if-statements in loops, I want to introduce a trick for restructuring reductions max and min to expose more parallelism. The following loop searches for the maximum value, z , in the array a by going through the elements one at a time:


for (i=0; i<n; i++) z = a[i]>z ? a[i] : z;

As written, it’s recursive like the loop from the previous section. You need the result of a given iteration before you can proceed to the next. However, since we are looking for the greatest element in the whole array, and since that will be the same element (essentially) no matter how we go about looking for it, we can restructure the loop to check several elements at a time (we assume n is evenly divisible by 2 and do not include the preconditioning loop):


z0 = 0.; z1 = 0.;for (i=0; i<n-1; i+=2) { z0 = z0<a[i] ? a[i]: z0; z1 = z1<a[i+1] ? a[i+1]: z1; }z = z0<z1 ? z1 : z0;

Do you see how the new loop calculates two new maximum values each iteration? These maximums are then compared with one another, and the winner becomes the new official max . It’s analogous to a play-off arrangement in a Ping-Pong tournament. Whereas the old loop was more like two players competing at a time while the rest sat around, the new loop runs several matches side by side. In general this particular optimization is not a good one to code by hand. On parallel processors, the compiler performs the reduction in its own way. If you hand-code similar to this example, you may inadvertently limit the compiler’s flexibility on a parallel system.

Conditionals that transfer control

Let’s step back a second. Have you noticed a similarity among all the loops so far? We have looked only at a particular type of conditional, conditional assignments — based on the outcome of the test, a variable gets reassigned. Of course, not every conditional ends up in an assignment. You can have statements that transfer flow of control, such as subroutine calls or goto statements. In the following example, the programmer is carefully checking before dividing by zero.

However, this test has an extremely negative impact on the performance because it forces the iterations to be done precisely in order:


DO I=1,N DO J=1,NIF (B(J,I) .EQ. 0 ) THEN PRINT *,I,JSTOP ENDIFA(J,I) = A(J,I) / B(J,I) ENDDOENDDO

Avoiding these tests is one of the reasons that the designers of the IEEE floating- point standard added the trap feature for operations such as dividing by zero. These traps allow the programmer in a performance-critical section of the code to achieve maximum performance yet still detect when an error occurs.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask