<< Chapter < Page | Chapter >> Page > |
Given that we have multithreaded capabilities and multiprocessors, we must still convince the threads to work together to accomplish some overall goal. Often we need some ways to coordinate and cooperate between the threads. There are several important techniques that are used while the program is running with multiple threads, including:
Each of these techniques has an overhead associated with it. Because these overheads are necessary to go parallel, we must make sure that we have sufficient work to make the benefit of parallel operation worth the cost.
This approach is the simplest method of coordinating your threads. As in the earlier examples in this chapter, a master thread sets up some global data structures that describe the tasks each thread is to perform and then use the
pthread_create( )
function to activate the proper number of threads. Each thread checks the global data structure using its thread-id as an index to find its task. The thread then performs the task and completes. The master thread waits at a
pthread_join( )
point, and when a thread has completed, it updates the global data structure and creates a new thread. These steps are repeated for each major iteration (such as a time-step) for the duration of the program:
for(ts=0;ts<10000;ts++) { /* Time Step Loop */
/* Setup tasks */for (ith=0;ith<NUM_THREADS;ith++) pthread_create(..,work_routine,..)
for (ith=0;ith<NUM_THREADS;ith++) pthread_join(...)
}work_routine() {
/* Perform Task */return;
}
The shortcoming of this approach is the overhead cost associated with creating and destroying an operating system thread for a potentially very short task.
The other approach is to have the threads created at the beginning of the program and to have them communicate amongst themselves throughout the duration of the application. To do this, they use such techniques as critical sections or barriers.
Synchronization is needed when there is a particular operation to a shared variable that can only be performed by one processor at a time. For example, in previous
SpinFunc( )
examples, consider the line:
globvar++;
In assembly language, this takes at least three instructions:
LOAD R1,globvar
ADD R1,1STORE R1,globvar
What if
globvar
contained 0, Thread 1 was running, and, at the precise moment it completed the
LOAD
into Register R1 and before it had completed the
ADD
or
STORE
instructions, the operating system interrupted the thread and switched to Thread 2? Thread 2 catches up and executes all three instructions using its registers: loading 0, adding 1 and storing the 1 back into
globvar
. Now Thread 2 goes to sleep and Thread 1 is restarted at the
ADD
instruction. Register R1 for Thread 1 contains the previously loaded value of 0; Thread 1 adds 1 and then stores 1 into
globvar
. What is wrong with this picture? We meant to use this code to count the number of threads that have passed this point. Two threads passed the point, but because of a bad case of bad timing, our variable indicates only that one thread passed. This is because the increment of a variable in memory is not
atomic
. That is, halfway through the increment, something else can happen.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?