<< Chapter < Page | Chapter >> Page > |
* Begin running the time steps
DO TICK=1,MAXTIME* Set the heat sourcesBLACK(ROWS/3, COLS/3)= 10.0
BLACK(2*ROWS/3, COLS/3) = 20.0BLACK(ROWS/3, 2*COLS/3) = -20.0
BLACK(2*ROWS/3, 2*COLS/3) = 20.0
Now we broadcast the entire array from process rank zero to all of the other processes in the
MPI_COMM_WORLD
communicator. Note that this call does the sending on rank zero process and receiving on the other processes. The net result of this call is that all the processes have the values formerly in the master process in a single call:
* Broadcast the array
CALL MPI_BCAST(BLACK,(ROWS+2)*(COLS+2),MPI_DOUBLE_PRECISION,+ 0,MPI_COMM_WORLD,IERR)
Now we perform the subset computation on each process. Note that we are using global coordinates because the array has the same shape on each of the processes. All we need to do is make sure we set up our particular strip of columns according to S and E:
* Perform the flow on our subsetDO C=S,E
DO R=1,ROWSRED(R,C) = ( BLACK(R,C) +
+ BLACK(R,C-1) + BLACK(R-1,C) ++ BLACK(R+1,C) + BLACK(R,C+1) ) / 5.0
ENDDOENDDO
Now we need to gather the appropriate strips from the processes into the appropriate strip in the master array for rebroadcast in the next time step. We could change the loop in the master to receive the messages in any order and check the
STATUS
variable to see which strip it received:
* Gather back up into the BLACK array in master (INUM = 0)
IF ( INUM .EQ. 0 ) THENDO C=S,E
DO R=1,ROWSBLACK(R,C) = RED(R,C)
ENDDOENDDO
DO I=1,NPROC-1CALL MPE_DECOMP1D(COLS, NPROC, I, LS, LE, IERR)
MYLEN = ( LE - LS ) + 1SRC = I TAG = 0
CALL MPI_RECV(BLACK(0,LS),MYLEN*(ROWS+2),+ MPI_DOUBLE_PRECISION, SRC, TAG,
+ MPI_COMM_WORLD, STATUS, IERR)* Print *,’Recv’,I,MYLEN
ENDDOELSE
MYLEN = ( E - S ) + 1DEST = 0
TAG = 0CALL MPI_SEND(RED(0,S),MYLEN*(ROWS+2),MPI_DOUBLE_PRECISION,
+ DEST, TAG, MPI_COMM_WORLD, IERR)Print *,’Send’,INUM,MYLEN
ENDIFENDDO
We use
MPE_DECOMP1D
to determine which strip we’re receiving from each process.
In some applications, the value that must be gathered is a sum or another single value. To accomplish this, you can use one of the MPI reduction routines that coalesce a set of distributed values into a single value using a single call.
Again at the end, we dump out the data for testing. However, since it has all been gathered back onto the master process, we only need to dump it on one process:
* Dump out data for verification
IF ( INUM .EQ.0 .AND. ROWS .LE. 20 ) THENFNAME = ’/tmp/mheatout’
OPEN(UNIT=9,NAME=FNAME,FORM=’formatted’)DO C=1,COLS
WRITE(9,100)(BLACK(R,C),R=1,ROWS)100 FORMAT(20F12.6)
ENDDOCLOSE(UNIT=9)
ENDIFCALL MPI_FINALIZE(IERR)END
When this program executes with four processes, it produces the following output:
% mpif77 -c mheat.f
mheat.f:MAIN mheat:
% mpif77 -o mheat mheat.o -lmpe% mheat -np 4
Calling MPI_INITMy Share 1 4 51 100
My Share 0 4 1 50My Share 3 4 151 200
My Share 2 4 101 150%
The ranks of the processes and the subsets of the computations for each process are shown in the output.
So that is a somewhat contrived example of the broadcast/gather approach to parallelizing an application. If the data structures are the right size and the amount of computation relative to communication is appropriate, this can be a very effective approach that may require the smallest number of code modifications compared to a single-processor version of the code.
Whether you chose PVM or MPI depends on which library the vendor of your system prefers. Sometimes MPI is the better choice because it contains the newest features, such as support for hardware-supported multicast or broadcast, that can significantly improve the overall performance of a scatter-gather application.
A good text on MPI is Using MPI — Portable Parallel Programming with the Message-Passing Interface , by William Gropp, Ewing Lusk, and Anthony Skjellum (MIT Press). You may also want to retrieve and print the MPI specification from (External Link) .
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?