9.2 Assisting the compiler (Page 5/6)

High performance computing Page 5 / 6



C$OMP PARALLEL DO SHARED(A,B) PRIVATE(I,TMP1,TMP2) 
      DO I=1,1000000TMP1 = ( A(I) ** 2 ) + ( B(I) ** 2 ) 
        TMP2 = SQRT(TMP1)B(I) = TMP2
      ENDDOC$OMP END PARALLEL DO

The iteration variable I also must be a thread-private variable. As the different threads increment their way through their particular subset of the arrays, they don’t want to be modifying a global value for I.

There are a number of other options as to how data will be operated on across the threads. This summarizes some of the other data semantics available:

Firstprivate These are thread-private variables that take an initial value from the global variable of the same name immediately before the loop begins executing.
Lastprivate These are thread-private variables except that the thread that executes the last iteration of the loop copies its value back into the global variable of the same name.
Reduction This indicates that a variable participates in a reduction operation that can be safely done in parallel. This is done by forming a partial reduction using a local variable in each thread and then combining the partial results at the end of the loop.

Each vendor may have different terms to indicate these data semantics, but most support all of these common semantics. [link] shows how the different types of data semantics operate.

Now that we have the data environment set up for the loop, the only remaining problem that must be solved is which threads will perform which iterations. It turns out that this is not a trivial task, and a wrong choice can have a significant negative impact on our overall performance.

Iteration scheduling

There are two basic techniques (along with a few variations) for dividing the iterations in a loop between threads. We can look at two extreme examples to get an idea of how this works:



C VECTOR ADD
      DO IPROB=1,10000A(IPROB) = B(IPROB) + C(IPROB) 
      ENDDOC PARTICLE TRACKINGDO IPROB=1,10000
          RANVAL = RAND(IPROB)CALL ITERATE_ENERGY(RANVAL) ENDDO
       ENDDO

This figure is a mixed flowchart with code and objects. The top is labeled variables, and a box labeled IGLOB points down to a line of code. After this line of code are boxes labeled IAM and more arrows pointing down at a second line of code. After the second line of code, there is a final arrow pointing down, labeled One Thread. Next to the IGLOB box are two smaller boxes labeled X and Y. These boxes point with arrows down at more boxes in the IAM row that are also labeled X and Y.

In both loops, all the computations are independent, so if there were 10,000 processors, each processor could execute a single iteration. In the vector-add example, each iteration would be relatively short, and the execution time would be relatively constant from iteration to iteration. In the particle tracking example, each iteration chooses a random number for an initial particle position and iterates to find the minimum energy. Each iteration takes a relatively long time to complete, and there will be a wide variation of completion times from iteration to iteration.

These two examples are effectively the ends of a continuous spectrum of the iteration scheduling challenges facing the FORTRAN parallel runtime environment:

Static

At the beginning of a parallel loop, each thread takes a fixed continuous portion of iterations of the loop based on the number of threads executing the loop.

Dynamic

With dynamic scheduling, each thread processes a chunk of data and when it has completed processing, a new chunk is processed. The chunk size can be varied by the programmer, but is fixed for the duration of the loop.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask

	22 Muscle and Pancreas Bio Path quiz By Brooke Delaney Start Exam
	26 AP 26 Fluid, Electrolyte, Acid-Base Balance Essay By OpenStax Start Flashcards
	23 Pesticides/Small animal poison Test By Brooke Delaney Start Exam
	3 AP 03 Cellular Level of Organization MCQ By OpenStax Start Quiz
	4 Sociology 04 Society and Social Interaction MCQ By OpenStax Start Quiz
	Anthropology Economic System By Richley Crapo Start Assignment
	Managerial Psychology Exam By Dan Ariely Start Exam
	23 AP 23 Digestive System MCQ By OpenStax Start Quiz
	4 Microeconomics 04 Labor Financial Markets By OpenStax Start Flashcards
	4 Physiotherapy Flashcards Set 4 By Rhodes Start Flashcards