# Connexions

You are here: Home » Content » Eliminating Clutter - Branches With Loops

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Endorsed by (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
• HPC Open Edu Cup

This module is included inLens: High Performance Computing Open Education Cup 2008-2009
By: Ken Kennedy Institute for Information TechnologyAs a part of collection: "High Performance Computing"

Click the "HPC Open Edu Cup" link to see all content they endorse.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• NSF Partnership

This module is included inLens: NSF Partnership in Signal Processing
By: Sidney BurrusAs a part of collection: "High Performance Computing"

Click the "NSF Partnership" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

• Featured Content

This module is included inLens: Connexions Featured Content
By: ConnexionsAs a part of collection: "High Performance Computing"

"The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"

Click the "Featured Content" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This module is included inLens: UniqU's lens
By: UniqU, LLCAs a part of collection: "High Performance Computing"

Click the "UniqU content" link to see all content selected in this lens.

• Lens for Engineering

This module is included inLens: Lens for Engineering
By: Sidney Burrus

Click the "Lens for Engineering" link to see all content selected in this lens.

• eScience, eResearch and Computational Problem Solving

This module is included inLens: eScience, eResearch and Computational Problem Solving
By: Jan E. OdegardAs a part of collection: "High Performance Computing"

Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

# Eliminating Clutter - Branches With Loops

Module by: Charles Severance, Kevin Dowd. E-mail the authors

Numerical codes usually spend most of their time in loops, so you don’t want anything inside a loop that doesn’t have to be there, especially an if-statement. Not only do if-statements gum up the works with extra instructions, they can force a strict order on the iterations of a loop. Of course, you can’t always avoid conditionals. Sometimes, though, people place them in loops to process events that could have been handled outside, or even ignored.

To take you back a few years, the following code shows a loop with a test for a value close to zero:


PARAMETER (SMALL = 1.E-20)
DO I=1,N
IF (ABS(A(I)) .GE. SMALL) THEN
B(I) = B(I) + A(I) * C
ENDIF
ENDDO


The idea was that if the multiplier, A(I), were reasonably small, there would be no reason to perform the math in the center of the loop. Because floating-point operations weren’t pipelined on many machines, a comparison and a branch was cheaper; the test would save time. On an older CISC or early RISC processor, a comparison and branch is probably still a savings. But on other architectures, it costs a lot less to just perform the math and skip the test. Eliminating the branch eliminates a control dependency and allows the compiler to pipeline more arithmetic operations. Of course, the answer could change slightly if the test is eliminated. It then becomes a question of whether the difference is significant. Here’s another example where a branch isn’t necessary. The loop finds the absolute value of each element in an array:


DO I=1,N
IF (A(I) .LT. 0.) A(I) = -A(I)
ENDDO


But why perform the test at all? On most machines, it’s quicker to perform the abs() operation on every element of the array.

We do have to give you a warning, though: if you are coding in C, the absolute value, fabs(), is a subroutine call. In this particular case, you are better off leaving the conditional in the loop.1

When you can’t always throw out the conditional, there are things you can do to minimize negative performance. First, we have to learn to recognize which conditionals within loops can be restructured and which cannot. Conditionals in loops fall into several categories:

• Loop invariant conditionals
• Loop index dependent conditionals
• Independent loop conditionals
• Dependent loop conditionals
• Reductions
• Conditionals that transfer control

Let’s look at these types in turn.

## Loop Invariant Conditionals

The following loop contains an invariant test:


DO I=1,K
IF (N .EQ. 0) THEN
A(I) = A(I) + B(I) * C
ELSE
A(I) = 0.
ENDIF
ENDDO


“Invariant” means that the outcome is always the same. Regardless of what happens to the variables A, B, C, and I, the value of N won’t change, so neither will the outcome of the test.

You can recast the loop by making the test outside and replicating the loop body twice — once for when the test is true, and once for when it is false, as in the following example:


IF (N .EQ. 0) THEN
DO I=1,K
A(I) = A(I) + B(I) * C
ENDDO
ELSE
DO I=1,K
A(I) = 0
ENDDO
ENDIF


The effect on the runtime is dramatic. Not only have we eliminated K-1 copies of the test, we have also assured that the computations in the middle of the loop are not control-dependent on the if-statement, and are therefore much easier for the compiler to pipeline.

We remember helping someone optimize a program with loops containing similar conditionals. They were checking to see whether debug output should be printed each iteration inside an otherwise highly optimizable loop. We can’t fault the person for not realizing how much this slowed the program down. Performance wasn’t important at the time. The programmer was just trying to get the code to produce good answers. But later on, when performance mattered, by cleaning up invariant conditionals, we were able to speed up the program by a factor of 100.

## Loop Index Dependent Conditionals

For loop index dependent conditionals, the test is true for certain ranges of the loop index variables. It isn’t always true or always false, like the conditional we just looked at, but it does change with a predictable pattern, and one that we can use to our advantage. The following loop has two index variables, I and J.


DO I=1,N
DO J=1,N
IF (J .LT. I)
A(J,I) = A(J,I) + B(J,I) * C
ELSE
A(J,I) = 0.0
ENDIF
ENDDO
ENDDO


Notice how the if-statement partitions the iterations into distinct sets: those for which it is true and those for which it is false. You can take advantage of the predictability of the test to restructure the loop into several loops — each custom-made for a different partition:


DO I=1,N
DO J=1,I-1
A(J,I) = A(J,I) + B(J,I) * C
ENDDO
DO J=I,N
A(J,I) = 0.0
ENDDO
ENDDO


The new version will almost always be faster. A possible exception is when N is a small value, like 3, in which case we have created more clutter. But then, the loop probably has such a small impact on the total runtime that it won’t matter which way it’s coded.

## Independent Loop Conditionals

It would be nice if you could optimize every loop by partitioning it. But more often than not, the conditional doesn’t directly depend on the value of the index variables. Although an index variable may be involved in addressing an array, it doesn’t create a recognizable pattern in advance — at least not one you can see when you are writing the program. Here’s such a loop:


DO I=1,N
DO J=1,N
IF (B(J,I) .GT. 1.0) A(J,I) = A(J,I) + B(J,I) * C
ENDDO
ENDDO


There is not much you can do about this type of conditional. But because every iteration is independent, the loop can be unrolled or can be performed in parallel.

## Dependent Loop Conditionals

When the conditional is based on a value that changes with each iteration of the loop, the compiler has no choice but to execute the code exactly as written. For instance, the following loop has an if-statement with built-in scalar recursion:


DO I=1,N
IF (X .LT. A(I)) X = X + B(I)*2.
ENDDO


You can’t know which way the branch will go for the next iteration until you are done with the current iteration. To recognize the dependency, try to unroll the loop slightly by hand. If you can’t start the second test until the first has finished, you have a dependent loop conditional. You may want to look at these types of loops to see if you can eliminate the iteration-to-iteration value.

## Reductions

Keep an eye out for loops in which the if-statement is performing a max or min function on a array. This is a reduction, so called because it reduces a array to a scalar result (the previous example was a reduction too, by the way). Again, we are getting a little bit ahead of ourselves, but since we are talking about if-statements in loops, I want to introduce a trick for restructuring reductions max and min to expose more parallelism. The following loop searches for the maximum value, z, in the array a by going through the elements one at a time:


for (i=0; i<n; i++)
z = a[i] > z ? a[i] : z;


As written, it’s recursive like the loop from the previous section. You need the result of a given iteration before you can proceed to the next. However, since we are looking for the greatest element in the whole array, and since that will be the same element (essentially) no matter how we go about looking for it, we can restructure the loop to check several elements at a time (we assume n is evenly divisible by 2 and do not include the preconditioning loop):


z0 = 0.;
z1 = 0.;
for (i=0; i< n-1; i+=2) {
z0 = z0 < a[i] ? a[i] : z0;
z1 = z1 < a[i+1] ? a[i+1] : z1;
}
z = z0 < z1 ? z1 : z0;


Do you see how the new loop calculates two new maximum values each iteration? These maximums are then compared with one another, and the winner becomes the new official max. It’s analogous to a play-off arrangement in a Ping-Pong tournament. Whereas the old loop was more like two players competing at a time while the rest sat around, the new loop runs several matches side by side. In general this particular optimization is not a good one to code by hand. On parallel processors, the compiler performs the reduction in its own way. If you hand-code similar to this example, you may inadvertently limit the compiler’s flexibility on a parallel system.

## Conditionals That Transfer Control

Let’s step back a second. Have you noticed a similarity among all the loops so far? We have looked only at a particular type of conditional, conditional assignments — based on the outcome of the test, a variable gets reassigned. Of course, not every conditional ends up in an assignment. You can have statements that transfer flow of control, such as subroutine calls or goto statements. In the following example, the programmer is carefully checking before dividing by zero.

However, this test has an extremely negative impact on the performance because it forces the iterations to be done precisely in order:


DO I=1,N
DO J=1,N
IF (B(J,I) .EQ. 0 ) THEN
PRINT *,I,J
STOP
ENDIF
A(J,I) = A(J,I) / B(J,I)
ENDDO
ENDDO


Avoiding these tests is one of the reasons that the designers of the IEEE floating- point standard added the trap feature for operations such as dividing by zero. These traps allow the programmer in a performance-critical section of the code to achieve maximum performance yet still detect when an error occurs.

## Footnotes

1. The machine representation of a floating-point number starts with a sign bit. If the bit is 0, the number is positive. If it is 1, the number is negative. The fastest absolute value function is one that merely “ands” out the sign bit. See macros in /usr/include/macros.h and /usr/include/math.h.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks