Perhaps the most important contributions of HPF are its data layout directives. Using these directives, the programmer can control how data is laid out based on the programmer's knowledge of the data interactions. An example directive is as follows:
REAL*4 ROD(10)
!HPF$ DISTRIBUTE ROD(BLOCK)
The !HPF$ prefix would be a comment to a non-HPF compiler and can safely be ignored by a straight FORTRAN 90 compiler. The DISTRIBUTE directive indicates that the ROD array is to be distributed across multiple processors. If this directive is not used, the ROD array is allocated on one processor and communicated to the other processors as necessary. There are several distributions that can be done in each dimension:
REAL*4 BOB(100,100,100),RICH(100,100,100)
!HPF$ DISTRIBUTE BOB(BLOCK,CYCLIC,*)
!HPF$ DISTRIBUTE RICH(CYCLIC(10))
These distributions operate as follows:
BLOCK The array is distributed across the processors using contiguous blocks of the index value. The blocks are made as large as possible. CYCLIC The array is distributed across the processors, mapping each successive element to the "next" processor, and when the last processor is reached, allocation starts again on the first processor. CYCLIC(n) The array is distributed the same as CYCLIC except that n successive elements are placed on each processor before moving on to the next processor.
All the elements in that dimension are placed on the same processor. This is most useful for multidimensional arrays.
Figure 1 shows how the elements of a simple array would be mapped onto three processors with different directives.
It must allocate four elements to Processors 1 and 2 because there is no Processor 4 available for the leftover element if it allocated three elements to Processors 1 and 2. In Figure 1, the elements are allocated on successive processors, wrapping around to Processor 1 after the last processor. In Figure 1, using a chunk size with CYCLIC is a compromise between pure BLOCK and pure CYCLIC.
To explore the use of the *, we can look at a simple two-dimensional array mapped onto four processors. In Figure 2, we show the array layout and each cell indicates which processor will hold the data for that cell in the two-dimensional array. In Figure 2, the directive decomposes in both dimensions simultaneously. This approach results in roughly square patches in the array. However, this may not be the best approach. In the following example, we use the * to indicate that we want all the elements of a particular column to be allocated on the same processor. So, the column values equally distribute the columns across the processors. Then, all the rows in each column follow where the column has been placed. This allows unit stride for the on-processor portions of the computation and is beneficial in some applications. The * syntax is also called on-processor distribution.
When dealing with more than one data structure to perform a computation, you can either separately distribute them or use the ALIGN directive to ensure that corresponding elements of the two data structures are to be allocated together. In the following example, we have a plate array and a scaling factor that must be applied to each column of the plate during the computation:
DIMENSION PLATE(200,200),SCALE(200)
!HPF$ DISTRIBUTE PLATE(*,BLOCK)
!HPF$ ALIGN SCALE(I) WITH PLATE(J,I)
Or:
DIMENSION PLATE(200,200),SCALE(200)
!HPF$ DISTRIBUTE PLATE(*,BLOCK)
!HPF$ ALIGN SCALE(:) WITH PLATE(*,:)
In both examples, the PLATE and the SCALE variables are allocated to the same processors as the corresponding columns of PLATE. The * and : syntax communicate the same information. When * is used, that dimension is collapsed, and it doesn't participate in the distribution. When the : is used, it means that dimension follows the corresponding dimension in the variable that has already been distributed.
You could also specify the layout of the SCALE variable and have the PLATE variable "follow" the layout of the SCALE variable:
DIMENSION PLATE(200,200),SCALE(200)
!HPF$ DISTRIBUTE SCALE(BLOCK)
!HPF$ ALIGN PLATE(J,I) WITH SCALE(I)
You can put simple arithmetic expressions into the ALIGN directive subject to some limitations. Other directives include:
PROCESSORS Allows you to create a shape of the processor configuration that can be used to align other data structures. REDISTRIBUTE and REALIGN Allow you to dynamically reshape data structures at runtime as the communication patterns change during the course of the run. TEMPLATE Allows you to create an array that uses no space. Instead of distributing one data structure and aligning all the other data structures, some users will create and distribute a template and then align all of the real data structures to that template.
The use of directives can range from very simple to very complex. In some situations, you distribute the one large shared structure, align a few related structures and you are done. In other situations, programmers attempt to optimize communications based on the topology of the interconnection network (hypercube, multi-stage interconnection network, mesh, or toroid) using very detailed directives. They also might carefully redistribute the data at the various phases of the computation.
Hopefully your application will yield good performance without too much effort.
"The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"