<< Chapter < Page | Chapter >> Page > |
Another way we can have a problem is on a multiprocessor when two processors execute these instructions simultaneously. They both do the
LOAD
, getting 0. Then they both add 1 and store 1 back to memory.
Boy, this is getting pretty picky. How often will either of these events really happen? Well, if it crashes your airline reservation system every 100,000 transactions or so, that would be way too often. Which processor actually got the honor of storing
their 1 back to memory is simply a race.
We must have some way of guaranteeing that only one thread can be in these three instructions at the same time. If one thread has started these instructions, all other threads must wait to enter until the first thread has exited. These areas are called critical sections . On single-CPU systems, there was a simple solution to critical sections: you could turn off interrupts for a few instructions and then turn them back on. This way you could guarantee that you would get all the way through before a timer or other interrupt occurred:
INTOFF // Turn off Interrupts
LOAD R1,globvarADD R1,1
STORE R1,globvarINTON // Turn on Interrupts
However, this technique does not work for longer critical sections or when there is more than one CPU. In these cases, you need a lock, a semaphore, or a mutex. Most thread libraries provide this type of routine. To use a mutex, we have to make some modifications to our example code:
...
pthread_mutex_t my_mutex; /* MUTEX data structure */...
main() {...
pthread_attr_init(&attr); /* Initialize attr with defaults */
pthread_mutex_init (&my_mutex, NULL);
.... pthread_create( ... )...
}void *SpinFunc(void *parm) {
...pthread_mutex_lock (&my_mutex);
globvar ++;pthread_mutex_unlock (&my_mutex);
while(globvar<THREAD_COUNT ) ;
printf("SpinFunc me=%d – done globvar=%d...\n", me, globvar);...
}
The mutex data structure must be declared in the shared area of the program. Before the threads are created,
pthread_mutex_init
must be called to initialize the mutex. Before
globvar
is incremented, we must lock the mutex and after we finish updating
globvar
(three instructions later), we unlock the mutex. With the code as shown above, there will never be more than one processor executing the
globvar
++ line of code, and the code will never hang because an increment was missed. Semaphores and locks are used in a similar way.
Interestingly, when using user space threads, an attempt to lock an already locked mutex, semaphore, or lock can cause a thread context switch. This allows the thread that “owns” the lock a better chance to make progress toward the point where they will unlock the critical section. Also, the act of unlocking a mutex can cause the thread waiting for the mutex to be dispatched by the thread library.
Barriers are different than critical sections. Sometimes in a multithreaded application, you need to have all threads arrive at a point before allowing any threads to execute beyond that point. An example of this is a time-based simulation . Each task processes its portion of the simulation but must wait until all of the threads have completed the current time step before any thread can begin the next time step. Typically threads are created, and then each thread executes a loop with one or more barriers in the loop. The rough pseudocode for this type of approach is as follows:
main() {
for (ith=0;ith<NUM_THREADS;ith++) pthread_create(..,work_routine,..)
for (ith=0;ith<NUM_THREADS;ith++) pthread_join(...) /* Wait a long time */
exit()}
work_routine() {for(ts=0;ts<10000;ts++) { /* Time Step Loop */
/* Compute total forces on particles */wait_barrier();
/* Update particle positions based on the forces */wait_barrier();
}return;
}
In a sense, our
SpinFunc( )
function implements a barrier. It sets a variable initially to 0. Then as threads arrive, the variable is incremented in a critical section. Immediately after the critical section, the thread spins until the precise moment that all the threads are in the spin loop, at which time all threads exit the spin loop and continue on.
For a critical section, only one processor can be executing in the critical section at the same time. For a barrier, all processors must arrive at the barrier before any of the processors can leave.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?