<< Chapter < Page | Chapter >> Page > |
When the operating system supports multiple threads per process, you can begin to use these threads to do simultaneous computational activity. There is still no requirement that these applications be executed on a multiprocessor system. When an application that uses four operating system threads is executed on a single processor machine, the threads execute in a time-shared fashion. If there is no other load on the system, each thread gets 1/4 of the processor. While there are good reasons to have more threads than processors for noncompute applications, it’s not a good idea to have more active threads than processors for compute-intensive applications because of thread-switching overhead. (For more detail on the effect of too many threads, see Appendix D, How FORTRAN Manages Threads at Runtime.
If you are using the POSIX threads library, it is a simple modification to request that your threads be created as operating-system rather rather than user threads, as the following code shows:
#define _REENTRANT /* basic 3-lines for threads */
#include<stdio.h>#include<pthread.h>#define THREAD_COUNT 2
void *SpinFunc(void *);int globvar; /* A global variable */
int index[THREAD_COUNT]; /* Local zero-based thread index */
pthread_t thread_id[THREAD_COUNT]; /* POSIX Thread IDs */
pthread_attr_t attr; /* Thread attributes NULL=use default */main() {int i,retval;
pthread_t tid;globvar = 0;pthread_attr_init(&attr); /* Initialize attr with defaults */
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
printf("Main - globvar=%d\n",globvar);for(i=0;i<THREAD_COUNT;i++) {
index[i]= i;
retval = pthread_create(&tid,&attr,SpinFunc,(void *) index[i]);printf("Main - creating i=%d tid=%d retval=%d\n",i,tid,retval);
thread_id[i]= tid;
}printf("Main thread - threads started globvar=%d\n",globvar);for(i=0;i<THREAD_COUNT;i++) {
printf("Main - waiting for join %d\n",thread_id[i]);
retval = pthread_join( thread_id[i], NULL ) ;
printf("Main - back from join %d retval=%d\n",i,retval);}
printf("Main thread - threads completed globvar=%d\n",globvar);}
The code executed by the master thread is modified slightly. We create an “attribute” data structure and set the
PTHREAD_SCOPE_SYSTEM
attribute to indicate that we would like our new threads to be created and scheduled by the operating system. We use the attribute information on the call to
pthread_create( )
. None of the other code has been changed. The following is the execution output of this new program:
recs % create3
Main - globvar=0Main - creating i=0 tid=4 retval=0
SpinFunc me=0 - sleeping 1 seconds ...Main - creating i=1 tid=5 retval=0
Main thread - threads started globvar=0Main - waiting for join 4
SpinFunc me=1 - sleeping 2 seconds ...SpinFunc me=0 - wake globvar=0...
SpinFunc me=0 - spinning globvar=1...SpinFunc me=1 - wake globvar=1...
SpinFunc me=1 - spinning globvar=2...SpinFunc me=1 - done globvar=2...
SpinFunc me=0 - done globvar=2...Main - back from join 0 retval=0
Main - waiting for join 5Main - back from join 1 retval=0
Main thread - threads completed globvar=2recs %
Now the program executes properly. When the first thread starts spinning, the operating system is context switching between all three threads. As the threads come out of their
sleep( )
, they increment their shared variable, and when the final thread increments the shared variable, the other two threads instantly notice the new value (because of the cache coherency protocol) and finish the loop. If there are fewer than three CPUs, a thread may have to wait for a time-sharing context switch to occur before it notices the updated global variable.
With operating-system threads and multiple processors, a program can realistically break up a large computation between several independent threads and compute the solution more quickly. Of course this presupposes that the computation could be done in parallel in the first place.
Notification Switch
Would you like to follow the 'High performance computing' conversation and receive update notifications?