<< Chapter < Page | Chapter >> Page > |
Another control structure is the ability to declare a function as "PURE." A PURE function has no side effects other than through its parameters. The programmer is guaranteeing that a PURE function can execute simultaneously on many processors with no ill effects. This allows HPF to assume that it will only operate on local data and does not need any data communication during the duration of the function execution. The programmer can also declare which parameters of the function are input parameters, output parameters, and input-output parameters.
The companies who marketed SIMD computers needed to come up with significant tools to allow efficient collective operations across all the processors. A perfect example of this is the SUM operation. To
SUM
the value of an array spread across N processors, the simplistic approach takes N steps. However, it is possible to accomplish it in log(N) steps using a technique called
parallel-prefix-sum . By the time HPF was in development, a number of these operations had been identified and implemented. HPF took the opportunity to define standardized syntax for these operations.
A sample of these operations includes:
SUM_PREFIX
Performs various types of parallel-prefix summations.ALL_SCATTER
Distributes a single value to a set of processors.GRADE_DOWN
Sorts into decreasing order.IANY
Computes the logical OR of a set of values.While there are a large number of these intrinsic functions, most applications use only a few of the operations.
In order to allow the vendors with diverse architectures to provide their particular advantage, HPF included the capability to link "extrinsic" functions. These functions didn't need to be written in FORTRAN 90/HPF and performed a number of vendor-supported capabilities. This capability allowed users to perform such tasks as the creation of hybrid applications with some HPF and some message passing.
High performance computing programmers always like the ability to do things their own way in order to eke out that last drop of performance.
To port our heat flow application to HPF, there is really only a single line of code that needs to be added. In the example below, we've changed to a larger two-dimensional array:
INTEGER PLATESIZ,MAXTIME
PARAMETER(PLATESIZ=2000,MAXTIME=200)!HPF$ DISTRIBUTE PLATE(*,BLOCK)
REAL*4 PLATE(PLATESIZ,PLATESIZ)INTEGER TICK
PLATE = 0.0* Add BoundariesPLATE(1,:) = 100.0
PLATE(PLATESIZ,:) = -40.0PLATE(:,PLATESIZ) = 35.23
PLATE(:,1) = 4.5DO TICK = 1,MAXTIMEPLATE(2:PLATESIZ-1,2:PLATESIZ-1) = (
+ PLATE(1:PLATESIZ-2,2:PLATESIZ-1) ++ PLATE(3:PLATESIZ-0,2:PLATESIZ-1) +
+ PLATE(2:PLATESIZ-1,1:PLATESIZ-2) ++ PLATE(2:PLATESIZ-1,3:PLATESIZ-0) ) / 4.0
PRINT 1000,TICK, PLATE(2,2)1000 FORMAT('TICK = ',I5, F13.8)
ENDDO*
END
You will notice that the HPF directive distributes the array columns using the
BLOCK
approach, keeping all the elements within a column on a single processor. At first glance, it might appear that (
BLOCK
,
BLOCK
) is the better distribution. However, there are two advantages to a (
*
,
BLOCK
) distribution. First, striding down a column is a unit-stride operation and so you might just as well process an entire column. The more significant aspect of the distribution is that a (
BLOCK
,
BLOCK
) distribution forces each processor to communicate with up to eight other processors to get its neighboring values. Using the (*,
BLOCK
) distribution, each processor will have to exchange data with at most two processors each time step.
When we look at PVM, we will look at this same program implemented in a SPMD-style message-passing fashion. In that example, you will see some of the details that HPF must handle to properly execute this code. After reviewing that code, you will probably choose to implement all of your future heat flow applications in HPF!
In some ways, HPF has been good for FORTRAN 90. Companies such as IBM with its SP-1 needed to provide some high-level language for those users who didn't want to write message-passing codes. Because of this, IBM has invested a great deal of effort in implementing and optimizing HPF. Interestingly, much of this effort will directly benefit the ability to develop more sophisticated FORTRAN 90 compilers. The extensive data flow analysis required to minimize communications and manage the dynamic data structures will carry over into FORTRAN 90 compilers even without using the HPF directives.
Time will tell if the HPF data distribution directives will no longer be needed and compilers will be capable of performing sufficient analysis of straight FORTRAN 90 code to optimize data placement and movement.
In its current form, HPF is an excellent vehicle for expressing the highly data-parallel, grid-based applications. Its weaknesses are irregular communications and dynamic load balancing. A new effort to develop the next version of HPF is under- way to address some of these issues. Unfortunately, it is more difficult to solve these runtime problems while maintaining good performance across a wide range of architectures.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?