Inside Collection (Textbook): High Performance Computing
Take a static, highly parallel program with a relative large inner loop. Compile the application for parallel execution. Execute the application increasing the threads. Examine the behavior when the number of threads exceed the available processors. See if different iteration scheduling approaches make a difference.
Take the following loop and execute with several different iteration scheduling choices. For chunk-based scheduling, use a large chunk size, perhaps 100,000. See if any approach performs better than static scheduling:
DO I=1,4000000
A(I) = B(I) * 2.34
ENDDO
Execute the following loop for a range of values for N from 1 to 16 million:
DO I=1,N
A(I) = B(I) * 2.34
ENDDO
Run the loop in a single processor. Then force the loop to run in parallel. At what point do you get better performance on multiple processors? Do the number of threads affect your observations?
Use an explicit parallelization directive to execute the following loop in parallel with a chunk size of 1:
J = 0
C$OMP PARALLEL DO PRIVATE(I) SHARED(J) SCHEDULE(DYNAMIC)
DO I=1,1000000
J = J + 1
ENDDO
PRINT *, J
C$OMP END PARALLEL DO
Execute the loop with a varying number of threads, including one. Also compile and execute the code in serial. Compare the output and execution times. What do the results tell you about cache coherency? About the cost of moving data from one cache to another, and about critical section costs?
"The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"