A thread is different from a process. When you add threads, they are added to the existing process rather than starting in a new process. Processes start with a single thread of execution and can add or remove threads throughout the duration of the program. Unlike processes, which operate in different memory spaces, all threads in a process share the same memory space. Figure 2 shows how the creation of a thread differs from the creation of a process. Not all of the memory space in a process is shared between all threads. In addition to the global area that is shared across all threads, each thread has a thread private area for its own local variables. It’s important for programmers to know when they are working with shared variables and when they are working with local variables.
When attempting to speed up high performance computing applications, threads have the advantage over processes in that multiple threads can cooperate and work on a shared data structure to hasten the computation. By dividing the work into smaller portions and assigning each smaller portion to a separate thread, the total work can be completed more quickly.
Multiple threads are also used in high performance database and Internet servers to improve the overall throughput of the server. With a single thread, the program can either be waiting for the next network request or reading the disk to satisfy the previous request. With multiple threads, one thread can be waiting for the next network transaction while several other threads are waiting for disk I/O to complete.
The following is an example of a simple multithreaded application. It begins with a single master thread that creates three additional threads. Each thread prints some messages, accesses some global and local variables, and then terminates:
#define_REENTRANT /* basic lines for threads */
#include <stdio.h>
#include <pthread.h>
#define THREAD_COUNT 3
void *TestFunc(void *);
int globvar; /* A global variable */
int index[THREAD_COUNT] /* Local zero-based thread index */
pthread_t thread_id[THREAD_COUNT]; /* POSIX Thread IDs */
main() {
int i,retval;
pthread_t tid;
globvar = 0;
printf("Main - globvar=%d\n",globvar);
for(i=0;i<THREAD_COUNT;i++) {
index[i] = i;
retval = pthread_create(&tid,NULL,TestFunc,(void *) index[i]);
printf("Main - creating i=%d tid=%d retval=%d\n",i,tid,retval);
thread_id[i] = tid;
}
printf("Main thread - threads started globvar=%d\n",globvar);
for(i=0;i<THREAD_COUNT;i++) {
printf("Main - waiting for join %d\n",thread_id[i]);
retval = pthread_join( thread_id[i], NULL ) ;
printf("Main - back from join %d retval=%d\n",i,retval);
}
printf("Main thread - threads completed globvar=%d\n",globvar);
}
void *TestFunc(void *parm) {
int me,self;
me = (int) parm; /* My own assigned thread ordinal */
self = pthread_self(); /* The POSIX Thread library thread number */
printf("TestFunc me=%d - self=%d globvar=%d\n",me,self,globvar);
globvar = me + 15;
printf("TestFunc me=%d - sleeping globvar=%d\n",me,globvar);
sleep(2);
printf("TestFunc me=%d - done param=%d globvar=%d\n",me,self,globvar);
}
The global shared areas in this case are those variables declared in the static area outside the main( ) code. The local variables are any variables declared within a routine. When threads are added, each thread gets its own function call stack. In C, the automatic variables that are declared at the beginning of each routine are allocated on the stack. As each thread enters a function, these variables are separately allocated on that particular thread’s stack. So these are the thread-local variables.
Unlike the fork( ) function, the pthread_create( ) function creates a new thread, and then control is returned to the calling thread. One of the parameters of the pthread_create( ) is the name of a function.
New threads begin execution in the function TestFunc( ) and the thread finishes when it returns from this function. When this program is executed, it produces the following output:
recs % cc -o create1 -lpthread -lposix4 create1.c
recs % create1
Main - globvar=0
Main - creating i=0 tid=4 retval=0
Main - creating i=1 tid=5 retval=0
Main - creating i=2 tid=6 retval=0
Main thread - threads started globvar=0
Main - waiting for join 4
TestFunc me=0 - self=4 globvar=0
TestFunc me=0 - sleeping globvar=15
TestFunc me=1 - self=5 globvar=15
TestFunc me=1 - sleeping globvar=16
TestFunc me=2 - self=6 globvar=16
TestFunc me=2 - sleeping globvar=17
TestFunc me=2 - done param=6 globvar=17
TestFunc me=1 - done param=5 globvar=17
TestFunc me=0 - done param=4 globvar=17
Main - back from join 0 retval=0
Main - waiting for join 5
Main - back from join 1 retval=0
Main - waiting for join 6
Main - back from join 2 retval=0
Main thread – threads completed globvar=17
recs %
You can see the threads getting created in the loop. The master thread completes the pthread_create( ) loop, executes the second loop, and calls the pthread_join( ) function. This function suspends the master thread until the specified thread completes. The master thread is waiting for Thread 4 to complete. Once the master thread suspends, one of the new threads is started. Thread 4 starts executing. Initially the variable globvar is set to 0 from the main program. The self, me, and param variables are thread-local variables, so each thread has its own copy. Thread 4 sets globvar to 15 and goes to sleep. Then Thread 5 begins to execute and sees globvar set to 15 from Thread 4; Thread 5 sets globvar to 16, and goes to sleep. This activates Thread 6, which sees the current value for globvar and sets it to 17. Then Threads 6, 5, and 4 wake up from their sleep, all notice the latest value of 17 in globvar, and return from the TestFunc( ) routine, ending the threads.
All this time, the master thread is in the middle of a pthread_join( ) waiting for Thread 4 to complete. As Thread 4 completes, the pthread_join( ) returns. The master thread then calls pthread_join( ) repeatedly to ensure that all three threads have been completed. Finally, the master thread prints out the value for globvar that contains the latest value of 17.
To summarize, when an application is executing with more than one thread, there are shared global areas and thread private areas. Different threads execute at different times, and they can easily work together in shared areas.
"The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"