# Connexions

You are here: Home » Content » High Performance Computing » High Performance FORTRAN (HPF)

## Navigation

### Table of Contents

• Introduction to the Connexions Edition
• Introduction to High Performance Computing

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Endorsed by (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
• HPC Open Edu Cup

This collection is included inLens: High Performance Computing Open Education Cup 2008-2009
By: Ken Kennedy Institute for Information Technology

Click the "HPC Open Edu Cup" link to see all content they endorse.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• NSF Partnership

This collection is included inLens: NSF Partnership in Signal Processing
By: Sidney Burrus

Click the "NSF Partnership" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

• Featured Content

This collection is included inLens: Connexions Featured Content
By: Connexions

Comments:

"The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"

Click the "Featured Content" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

• Lens for Engineering

This module and collection are included inLens: Lens for Engineering
By: Sidney Burrus

Click the "Lens for Engineering" link to see all content selected in this lens.

• eScience, eResearch and Computational Problem Solving

This collection is included inLens: eScience, eResearch and Computational Problem Solving
By: Jan E. Odegard

Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Inside Collection (Textbook):

Textbook by: Charles Severance, Kevin Dowd. E-mail the authors

# High Performance FORTRAN (HPF)

Module by: Charles Severance, Kevin Dowd. E-mail the authors

In March 1992, the High Performance Fortran Forum (HPFF) began meeting to discuss and define a set of additions to FORTRAN 90 to make it more practical for use in a scalable computing environment. The plan was to develop a specification within the calendar year so that vendors could quickly begin to implement the standard. The scope of the effort included the following:

• Identify scalars and arrays that will be distributed across a parallel machine.
• Say how they will be distributed. Will they be strips, blocks, or something else?
• Specify how these variables will be aligned with respect to one another.
• Redistribute and realign data structures at runtime.
• Add a FORALL control construct for parallel assignments that are difficult or impossible to construct using FORTRAN 90's array syntax.
• Make improvements to the FORTRAN 90 WHERE control construct.
• Add intrinsic functions for common parallel operations.

There were several sources of inspiration for the HPF effort. Layout directives were already part of the FORTRAN 90 programming environment for some SIMD computers (i.e., the CM-2). Also, PVM, the first portable message-passing environment, had been released a year earlier, and users had a year of experience trying to decompose by hand programs. They had developed some basic usable techniques for data decomposition that worked very well but required far too much bookkeeping.1

The HPF effort brought together a diverse set of interests from all the major high performance computing vendors. Vendors representing all the major architectures were represented. As a result HPF was designed to be implemented on nearly all types of architectures.

There is an effort underway to produce the next FORTRAN standard: FORTRAN 95. FORTRAN 95 is expected to adopt some but not all of the HPF modifications.

## Programming in HPF

At its core, HPF includes FORTRAN 90. If a FORTRAN 90 program were run through an HPF compiler, it must produce the same results as if it were run through a FORTRAN 90 compiler. Assuming an HPF program only uses FORTRAN 90 constructs and HPF directives, a FORTRAN 90 compiler could ignore the directives, and it should produce the same results as an HPF compiler.

As the user adds directives to the program, the semantics of the program are not changed. If the user completely misunderstands the application and inserts extremely ill-conceived directives, the program produces correct results very slowly. An HPF compiler doesn't try to "improve on" the user's directives. It assumes the programmer is omniscient.2

Once the user has determined how the data will be distributed across the processors, the HPF compiler attempts to use the minimum communication necessary and overlaps communication with computation whenever possible. HPF generally uses an "owner computes" rule for the placement of the computations. A particular element in an array is computed on the processor that stores that array element.

All the necessary data to perform the computation is gathered from remote processors, if necessary, to perform the computation. If the programmer is clever in decomposition and alignment, much of the data needed will be from the local memory rather then a remote memory. The HPF compiler is also responsible for allocating any temporary data structures needed to support communications at runtime.

In general, the HPF compiler is not magic - it simply does a very good job with the communication details when the programmer can design a good data decomposition. At the same time, it retains portability with the single CPU and shared uniform memory systems using FORTRAN 90.

## HPF data layout directives

Perhaps the most important contributions of HPF are its data layout directives. Using these directives, the programmer can control how data is laid out based on the programmer's knowledge of the data interactions. An example directive is as follows:


REAL*4 ROD(10)
!HPF$DISTRIBUTE ROD(BLOCK)  The !HPF$ prefix would be a comment to a non-HPF compiler and can safely be ignored by a straight FORTRAN 90 compiler. The DISTRIBUTE directive indicates that the ROD array is to be distributed across multiple processors. If this directive is not used, the ROD array is allocated on one processor and communicated to the other processors as necessary. There are several distributions that can be done in each dimension:


REAL*4 BOB(100,100,100),RICH(100,100,100)
!HPF$DISTRIBUTE BOB(BLOCK,CYCLIC,*) !HPF$ DISTRIBUTE RICH(CYCLIC(10))


These distributions operate as follows:

• BLOCK The array is distributed across the processors using contiguous blocks of the index value. The blocks are made as large as possible.
• CYCLIC The array is distributed across the processors, mapping each successive element to the "next" processor, and when the last processor is reached, allocation starts again on the first processor.
• CYCLIC(n) The array is distributed the same as CYCLIC except that n successive elements are placed on each processor before moving on to the next processor.

### Note:

All the elements in that dimension are placed on the same processor. This is most useful for multidimensional arrays.

Figure 1 shows how the elements of a simple array would be mapped onto three processors with different directives.

It must allocate four elements to Processors 1 and 2 because there is no Processor 4 available for the leftover element if it allocated three elements to Processors 1 and 2. In Figure 1, the elements are allocated on successive processors, wrapping around to Processor 1 after the last processor. In Figure 1, using a chunk size with CYCLIC is a compromise between pure BLOCK and pure CYCLIC.

To explore the use of the *, we can look at a simple two-dimensional array mapped onto four processors. In Figure 2, we show the array layout and each cell indicates which processor will hold the data for that cell in the two-dimensional array. In Figure 2, the directive decomposes in both dimensions simultaneously. This approach results in roughly square patches in the array. However, this may not be the best approach. In the following example, we use the * to indicate that we want all the elements of a particular column to be allocated on the same processor. So, the column values equally distribute the columns across the processors. Then, all the rows in each column follow where the column has been placed. This allows unit stride for the on-processor portions of the computation and is beneficial in some applications. The * syntax is also called on-processor distribution.

When dealing with more than one data structure to perform a computation, you can either separately distribute them or use the ALIGN directive to ensure that corresponding elements of the two data structures are to be allocated together. In the following example, we have a plate array and a scaling factor that must be applied to each column of the plate during the computation:


DIMENSION PLATE(200,200),SCALE(200)
!HPF$DISTRIBUTE PLATE(*,BLOCK) !HPF$ ALIGN SCALE(I) WITH PLATE(J,I)


Or:


DIMENSION PLATE(200,200),SCALE(200)
!HPF$DISTRIBUTE PLATE(*,BLOCK) !HPF$ ALIGN SCALE(:) WITH PLATE(*,:)


In both examples, the PLATE and the SCALE variables are allocated to the same processors as the corresponding columns of PLATE. The * and : syntax communicate the same information. When * is used, that dimension is collapsed, and it doesn't participate in the distribution. When the : is used, it means that dimension follows the corresponding dimension in the variable that has already been distributed.

You could also specify the layout of the SCALE variable and have the PLATE variable "follow" the layout of the SCALE variable:


DIMENSION PLATE(200,200),SCALE(200)
!HPF$DISTRIBUTE SCALE(BLOCK) !HPF$ ALIGN PLATE(J,I) WITH SCALE(I)


You can put simple arithmetic expressions into the ALIGN directive subject to some limitations. Other directives include:

• PROCESSORS Allows you to create a shape of the processor configuration that can be used to align other data structures.
• REDISTRIBUTE and REALIGN Allow you to dynamically reshape data structures at runtime as the communication patterns change during the course of the run.
• TEMPLATE Allows you to create an array that uses no space. Instead of distributing one data structure and aligning all the other data structures, some users will create and distribute a template and then align all of the real data structures to that template.

The use of directives can range from very simple to very complex. In some situations, you distribute the one large shared structure, align a few related structures and you are done. In other situations, programmers attempt to optimize communications based on the topology of the interconnection network (hypercube, multi-stage interconnection network, mesh, or toroid) using very detailed directives. They also might carefully redistribute the data at the various phases of the computation.

Hopefully your application will yield good performance without too much effort.

## HPF control structures

While the HPF designers were in the midst of defining a new language, they set about improving on what they saw as limitations in FORTRAN 90. Interestingly, these modifications are what is being considered as part of the new FORTRAN 95 standard.

The FORALL statement allows the user to express simple iterative operations that apply to the entire array without resorting to a do-loop (remember, do-loops force order). For example:


FORALL (I=1:100, J=1:100) A(I,J) = I + J


This can be expressed in native FORTRAN 90 but it is rather ugly, counterintuitive, and prone to error.

Another control structure is the ability to declare a function as "PURE." A PURE function has no side effects other than through its parameters. The programmer is guaranteeing that a PURE function can execute simultaneously on many processors with no ill effects. This allows HPF to assume that it will only operate on local data and does not need any data communication during the duration of the function execution. The programmer can also declare which parameters of the function are input parameters, output parameters, and input-output parameters.

## HPF intrinsics

The companies who marketed SIMD computers needed to come up with significant tools to allow efficient collective operations across all the processors. A perfect example of this is the SUM operation. To SUM the value of an array spread across N processors, the simplistic approach takes N steps. However, it is possible to accomplish it in log(N) steps using a technique called parallel-prefix-sum. By the time HPF was in development, a number of these operations had been identified and implemented. HPF took the opportunity to define standardized syntax for these operations.

A sample of these operations includes:

• SUM_PREFIX Performs various types of parallel-prefix summations.
• ALL_SCATTER Distributes a single value to a set of processors.
• GRADE_DOWN Sorts into decreasing order.
• IANY Computes the logical OR of a set of values.

While there are a large number of these intrinsic functions, most applications use only a few of the operations.

## HPF extrinsics

In order to allow the vendors with diverse architectures to provide their particular advantage, HPF included the capability to link "extrinsic" functions. These functions didn't need to be written in FORTRAN 90/HPF and performed a number of vendor-supported capabilities. This capability allowed users to perform such tasks as the creation of hybrid applications with some HPF and some message passing.

High performance computing programmers always like the ability to do things their own way in order to eke out that last drop of performance.

## Heat Flow in HPF

To port our heat flow application to HPF, there is really only a single line of code that needs to be added. In the example below, we've changed to a larger two-dimensional array:


INTEGER PLATESIZ,MAXTIME
PARAMETER(PLATESIZ=2000,MAXTIME=200)
!HPF\$   DISTRIBUTE PLATE(*,BLOCK)
REAL*4 PLATE(PLATESIZ,PLATESIZ)
INTEGER TICK
PLATE = 0.0

* Add Boundaries
PLATE(1,:) = 100.0
PLATE(PLATESIZ,:) = -40.0
PLATE(:,PLATESIZ) = 35.23
PLATE(:,1) = 4.5

DO TICK = 1,MAXTIME
PLATE(2:PLATESIZ-1,2:PLATESIZ-1) = (
+       PLATE(1:PLATESIZ-2,2:PLATESIZ-1) +
+       PLATE(3:PLATESIZ-0,2:PLATESIZ-1) +
+       PLATE(2:PLATESIZ-1,1:PLATESIZ-2) +
+       PLATE(2:PLATESIZ-1,3:PLATESIZ-0) ) / 4.0
PRINT 1000,TICK, PLATE(2,2)
1000      FORMAT('TICK = ',I5, F13.8)
ENDDO
*
END


You will notice that the HPF directive distributes the array columns using the BLOCK approach, keeping all the elements within a column on a single processor. At first glance, it might appear that (BLOCK,BLOCK) is the better distribution. However, there are two advantages to a (*,BLOCK) distribution. First, striding down a column is a unit-stride operation and so you might just as well process an entire column. The more significant aspect of the distribution is that a (BLOCK,BLOCK) distribution forces each processor to communicate with up to eight other processors to get its neighboring values. Using the (*,BLOCK) distribution, each processor will have to exchange data with at most two processors each time step.

When we look at PVM, we will look at this same program implemented in a SPMD-style message-passing fashion. In that example, you will see some of the details that HPF must handle to properly execute this code. After reviewing that code, you will probably choose to implement all of your future heat flow applications in HPF!

## HPF Summary

In some ways, HPF has been good for FORTRAN 90. Companies such as IBM with its SP-1 needed to provide some high-level language for those users who didn't want to write message-passing codes. Because of this, IBM has invested a great deal of effort in implementing and optimizing HPF. Interestingly, much of this effort will directly benefit the ability to develop more sophisticated FORTRAN 90 compilers. The extensive data flow analysis required to minimize communications and manage the dynamic data structures will carry over into FORTRAN 90 compilers even without using the HPF directives.

Time will tell if the HPF data distribution directives will no longer be needed and compilers will be capable of performing sufficient analysis of straight FORTRAN 90 code to optimize data placement and movement.

In its current form, HPF is an excellent vehicle for expressing the highly data-parallel, grid-based applications. Its weaknesses are irregular communications and dynamic load balancing. A new effort to develop the next version of HPF is under- way to address some of these issues. Unfortunately, it is more difficult to solve these runtime problems while maintaining good performance across a wide range of architectures.

## Footnotes

1. As we shall soon see.
2. Always a safe assumption.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

### Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

### Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

### Add:

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks