<< Chapter < Page | Chapter >> Page > |
Take a static, highly parallel program with a relative large inner loop. Compile the application for parallel execution. Execute the application increasing the threads. Examine the behavior when the number of threads exceed the available processors. See if different iteration scheduling approaches make a difference.
Take the following loop and execute with several different iteration scheduling choices. For chunk-based scheduling, use a large chunk size, perhaps 100,000. See if any approach performs better than static scheduling:
DO I=1,4000000
A(I) = B(I) * 2.34ENDDO
Execute the following loop for a range of values for N from 1 to 16 million:
DO I=1,N
A(I) = B(I) * 2.34ENDDO
Run the loop in a single processor. Then force the loop to run in parallel. At what point do you get better performance on multiple processors? Do the number of threads affect your observations?
Use an explicit parallelization directive to execute the following loop in parallel with a chunk size of 1:
J = 0
C$OMP PARALLEL DO PRIVATE(I) SHARED(J) SCHEDULE(DYNAMIC)DO I=1,1000000
J = J + 1ENDDO
PRINT *, JC$OMP END PARALLEL DO
Execute the loop with a varying number of threads, including one. Also compile and execute the code in serial. Compare the output and execution times. What do the results tell you about cache coherency? About the cost of moving data from one cache to another, and about critical section costs?
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?