<< Chapter < Page Chapter >> Page >

Another way we can have a problem is on a multiprocessor when two processors execute these instructions simultaneously. They both do the LOAD , getting 0. Then they both add 1 and store 1 back to memory. Boy, this is getting pretty picky. How often will either of these events really happen? Well, if it crashes your airline reservation system every 100,000 transactions or so, that would be way too often. Which processor actually got the honor of storing their 1 back to memory is simply a race.

We must have some way of guaranteeing that only one thread can be in these three instructions at the same time. If one thread has started these instructions, all other threads must wait to enter until the first thread has exited. These areas are called critical sections . On single-CPU systems, there was a simple solution to critical sections: you could turn off interrupts for a few instructions and then turn them back on. This way you could guarantee that you would get all the way through before a timer or other interrupt occurred:


INTOFF // Turn off Interrupts LOAD R1,globvarADD R1,1 STORE R1,globvarINTON // Turn on Interrupts

However, this technique does not work for longer critical sections or when there is more than one CPU. In these cases, you need a lock, a semaphore, or a mutex. Most thread libraries provide this type of routine. To use a mutex, we have to make some modifications to our example code:

... pthread_mutex_t my_mutex; /* MUTEX data structure */... main() {... pthread_attr_init(&attr); /* Initialize attr with defaults */ pthread_mutex_init (&my_mutex, NULL); .... pthread_create( ... )... }void *SpinFunc(void *parm) { ...pthread_mutex_lock (&my_mutex); globvar ++;pthread_mutex_unlock (&my_mutex); while(globvar<THREAD_COUNT ) ; printf("SpinFunc me=%d – done globvar=%d...\n", me, globvar);... }

The mutex data structure must be declared in the shared area of the program. Before the threads are created, pthread_mutex_init must be called to initialize the mutex. Before globvar is incremented, we must lock the mutex and after we finish updating globvar (three instructions later), we unlock the mutex. With the code as shown above, there will never be more than one processor executing the globvar ++ line of code, and the code will never hang because an increment was missed. Semaphores and locks are used in a similar way.

Interestingly, when using user space threads, an attempt to lock an already locked mutex, semaphore, or lock can cause a thread context switch. This allows the thread that “owns” the lock a better chance to make progress toward the point where they will unlock the critical section. Also, the act of unlocking a mutex can cause the thread waiting for the mutex to be dispatched by the thread library.

Barriers

Barriers are different than critical sections. Sometimes in a multithreaded application, you need to have all threads arrive at a point before allowing any threads to execute beyond that point. An example of this is a time-based simulation . Each task processes its portion of the simulation but must wait until all of the threads have completed the current time step before any thread can begin the next time step. Typically threads are created, and then each thread executes a loop with one or more barriers in the loop. The rough pseudocode for this type of approach is as follows:

main() { for (ith=0;ith<NUM_THREADS;ith++) pthread_create(..,work_routine,..) for (ith=0;ith<NUM_THREADS;ith++) pthread_join(...) /* Wait a long time */ exit()} work_routine() {for(ts=0;ts<10000;ts++) { /* Time Step Loop */ /* Compute total forces on particles */wait_barrier(); /* Update particle positions based on the forces */wait_barrier(); }return; }

In a sense, our SpinFunc( ) function implements a barrier. It sets a variable initially to 0. Then as threads arrive, the variable is incremented in a critical section. Immediately after the critical section, the thread spins until the precise moment that all the threads are in the spin loop, at which time all threads exit the spin loop and continue on.

For a critical section, only one processor can be executing in the critical section at the same time. For a barrier, all processors must arrive at the barrier before any of the processors can leave.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask