# Connexions

You are here: Home » Content » High Performance Computing » Classical Optimizations

• Introduction to the Connexions Edition
• Introduction to High Performance Computing

### Lenses

What is a lens?

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

#### Endorsed by (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
• HPC Open Edu Cup

This collection is included inLens: High Performance Computing Open Education Cup 2008-2009
By: Ken Kennedy Institute for Information Technology

Click the "HPC Open Edu Cup" link to see all content they endorse.

#### Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
• NSF Partnership

This collection is included inLens: NSF Partnership in Signal Processing
By: Sidney Burrus

Click the "NSF Partnership" link to see all content affiliated with them.

Click the tag icon to display tags associated with this content.

• Featured Content

This collection is included inLens: Connexions Featured Content
By: Connexions

"The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"

Click the "Featured Content" link to see all content affiliated with them.

#### Also in these lenses

• UniqU content

This collection is included inLens: UniqU's lens
By: UniqU, LLC

Click the "UniqU content" link to see all content selected in this lens.

• Lens for Engineering

This module and collection are included inLens: Lens for Engineering
By: Sidney Burrus

Click the "Lens for Engineering" link to see all content selected in this lens.

• eScience, eResearch and Computational Problem Solving

This collection is included inLens: eScience, eResearch and Computational Problem Solving
By: Jan E. Odegard

Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

### Recently Viewed

This feature requires Javascript to be enabled.

### Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.

Inside Collection (Textbook):

Textbook by: Charles Severance, Kevin Dowd. E-mail the authors

# Classical Optimizations

Module by: Charles Severance, Kevin Dowd. E-mail the authors

Once the intermediate language is broken into basic blocks, there are a number of optimizations that can be performed on the code in these blocks. Some optimizations are very simple and affect a few tuples within a basic block. Other optimizations move code from one basic block to another without altering the program results. For example, it is often valuable to move a computation from the body of a loop to the code immediately preceding the loop.

In this section, we are going to list classical optimizations by name and tell you what they are for. We’re not suggesting that you make the changes; most compilers since the mid-1980s automatically perform these optimizations at all but their lowest optimization level. As we said at the start of the chapter, if you understand what the compiler can (and can’t) do, you will become a better programmer because you will be able to play to the compiler’s strengths.

## Copy Propagation

To start, let’s look at a technique for untangling calculations. Take a look at the following segment of code: notice the two computations involving X.


X = Y
Z = 1.0 + X


As written, the second statement requires the results of the first before it can proceed — you need X to calculate Z. Unnecessary dependencies could translate into a delay at runtime.1 With a little bit of rearrangement we can make the second statement independent of the first, by propagating a copy of Y. The new calculation for Z uses the value of Y directly:


X = Y
Z = 1.0 + Y


Notice that we left the first statement, X=Y, intact. You may ask, “Why keep it?” The problem is that we can’t tell whether the value of X is needed elsewhere. That is something for another analysis to decide. If it turns out that no other statement needs the new value of X, the assignment is eliminated later by dead code removal.

## Constant Folding

A clever compiler can find constants throughout your program. Some of these are “obvious” constants like those defined in parameter statements. Others are less obvious, such as local variables that are never redefined. When you combine them in a calculation, you get a constant expression. The little program below has two constants, I and K:


PROGRAM MAIN
INTEGER I,K
PARAMETER (I = 100)
K = 200
J = I + K
END


Because I and K are constant individually, the combination I+K is constant, which means that J is a constant too. The compiler reduces constant expressions like I+K into constants with a technique called constant folding.

How does constant folding work? You can see that it is possible to examine every path along which a given variable could be defined en route to a particular basic block. If you discover that all paths lead back to the same value, that is a constant; you can replace all references to that variable with that constant. This replacement has a ripple-through effect. If the compiler finds itself looking at an expression that is made up solely of constants, it can evaluate the expression at compile time and replace it with a constant. After several iterations, the compiler will have located most of the expressions that are candidates for constant folding.

A programmer can sometimes improve performance by making the compiler aware of the constant values in your application. For example, in the following code segment:

X = X * Y


the compiler may generate quite different runtime code if it knew that Y was 0, 1, 2, or 175.32. If it does not know the value for Y, it must generate the most conservative (not necessarily the fastest) code sequence. A programmer can communicate these values through the use of the PARAMETER statement in FORTRAN. By the use of a parameter statement, the compiler knows the values for these constants at runtime. Another example we have seen is:


DO I = 1,10000
DO J=1,IDIM
.....
ENDDO
ENDDO


After looking at the code, it’s clear that IDIM was either 1, 2, or 3, depending on the data set in use. Clearly if the compiler knew that IDIM was 1, it could generate much simpler and faster code.

Programs often contain sections of dead code that have no effect on the answers and can be removed. Occasionally, dead code is written into the program by the author, but a more common source is the compiler itself; many optimizations produce dead code that needs to be swept up afterwards.

Dead code comes in two types:

• Instructions that are unreachable
• Instructions that produce results that are never used

You can easily write some unreachable code into a program by directing the flow of control around it — permanently. If the compiler can tell it’s unreachable, it will eliminate it. For example, it’s impossible to reach the statement I = 4 in this program:


PROGRAM MAIN
I = 2
WRITE (*,*) I
STOP
I = 4
WRITE (*,*) I
END


The compiler throws out everything after the STOP statement and probably gives you a warning. Unreachable code produced by the compiler during optimization will be quietly whisked away.

Computations with local variables can produce results that are never used. By analyzing a variable’s definitions and uses, the compiler can see whether any other part of the routine references it. Of course the compiler can’t tell the ultimate fate of variables that are passed between routines, external or common, so those computations are always kept (as long as they are reachable).2 In the following program, computations involving k contribute nothing to the final answer and are good candidates for dead code elimination:


main ()
{
int i,k;
i = k = 1;
i += 1;
k += 2;
printf ("%d\n",i);
}


Dead code elimination has often produced some amazing benchmark results from poorly written benchmarks. See 37 for an example of this type of code.

## Strength Reduction

Operations or expressions have time costs associated with them. Sometimes it’s possible to replace a more expensive calculation with a cheaper one. We call this strength reduction. The following code fragment contains two expensive operations:


REAL X,Y
Y = X**2
J = K*2


For the exponentiation operation on the first line, the compiler generally makes an embedded mathematical subroutine library call. In the library routine, X is converted to a logarithm, multiplied, then converted back. Overall, raising X to a power is expensive — taking perhaps hundreds of machine cycles. The key is to notice that X is being raised to a small integer power. A much cheaper alternative would be to express it as X*X, and pay only the cost of multiplication. The second statement shows integer multiplication of a variable K by 2. Adding K+K yields the same answer, but takes less time.

There are many opportunities for compiler-generated strength reductions; these are just a couple of them. We will see an important special case when we look at induction variable simplification. Another example of a strength reduction is replacing multiplications by integer powers of two by logical shifts.

## Variable Renaming

In (Reference), we talked about register renaming. Some processors can make runtime decisions to replace all references to register 1 with register 2, for instance, to eliminate bottlenecks. Register renaming keeps instructions that are recycling the same registers for different purposes from having to wait until previous instructions have finished with them.

The same situation can occur in programs — the same variable (i.e., memory location) can be recycled for two unrelated purposes. For example, see the variable x in the following fragment:


x = y * z;
q = r + x + x;
x = a + b;


When the compiler recognizes that a variable is being recycled, and that its current and former uses are independent, it can substitute a new variable to keep the calculations separate:


x0 = y * z;
q = r + x0 + x0;
x = a + b;


Variable renaming is an important technique because it clarifies that calculations are independent of each other, which increases the number of things that can be done in parallel.

## Common Subexpression Elimination

Subexpressions are pieces of expressions. For instance, A+B is a subexpression of C*(A+B). If A+B appears in several places, like it does below, we call it a common subexpression:


D = C * (A + B)
E = (A + B)/2.


Rather than calculate A + B twice, the compiler can generate a temporary variable and use it wherever A + B is required:


temp = A + B
D = C * temp
E = temp/2.


Different compilers go to different lengths to find common subexpressions. Most pairs, such as A+B, are recognized. Some can recognize reuse of intrinsics, such as SIN(X). Don’t expect the compiler to go too far though. Subexpressions like A+B+C are not computationally equivalent to reassociated forms like B+C+A, even though they are algebraically the same. In order to provide predictable results on computations, FORTRAN must either perform operations in the order specified by the user or reorder them in a way to guarantee exactly the same result. Sometimes the user doesn’t care which way A+B+C associates, but the compiler cannot assume the user does not care.

Address calculations provide a particularly rich opportunity for common subexpression elimination. You don’t see the calculations in the source code; they’re generated by the compiler. For instance, a reference to an array element A(I,J) may translate into an intermediate language expression such as:


+ (J-1)*sizeof_datatype(A) * column_dimension(A)


If A(I,J) is used more than once, we have multiple copies of the same address computation. Common subexpression elimination will (hopefully) discover the redundant computations and group them together.

## Loop-Invariant Code Motion

Loops are where many high performance computing programs spend a majority of their time. The compiler looks for every opportunity to move calculations out of a loop body and into the surrounding code. Expressions that don’t change after the loop is entered (loop-invariant expressions) are prime targets. The following loop has two loop-invariant expressions:


DO I=1,N
A(I) = B(I) + C * D
E = G(K)
ENDDO


Below, we have modified the expressions to show how they can be moved to the outside:


temp = C * D
DO I=1,N
A(I) = B(I) + temp
ENDDO
E = G(K)


It is possible to move code before or after the loop body. As with common subexpression elimination, address arithmetic is a particularly important target for loop- invariant code motion. Slowly changing portions of index calculations can be pushed into the suburbs, to be executed only when needed.

## Induction Variable Simplification

Loops can contain what are called induction variables. Their value changes as a linear function of the loop iteration count. For example, K is an induction variable in the following loop. Its value is tied to the loop index:


DO I=1,N
K = I*4 + M
...
ENDDO


Induction variable simplification replaces calculations for variables like K with simpler ones. Given a starting point and the expression’s first derivative, you can arrive at K’s value for the nth iteration by stepping through the n-1 intervening iterations:


K = M
DO I=1,N
K = K + 4
...
ENDDO


The two forms of the loop aren’t equivalent; the second won’t give you the value of K given any value of I. Because you can’t jump into the middle of the loop on the nth iteration, K always takes on the same values it would have if we had kept the original expression.

Induction variable simplification probably wouldn’t be a very important optimization, except that array address calculations look very much like the calculation for K in the example above. For instance, the address calculation for A(I) within a loop iterating on the variable I looks like this:

address = base_address(A) + (I-1) * sizeof_datatype(A)


Performing all that math is unnecessary. The compiler can create a new induction variable for references to A and simplify the address calculations:


outside the loop...
indie the loop...


Induction variable simplification is especially useful on processors that can automatically increment a register each time it is used as a pointer for a memory reference. While stepping through a loop, the memory reference and the address arithmetic can both be squeezed into a single instruction—a great savings.

## Object Code Generation

Precompilation, lexical analysis, parsing, and many optimization techniques are somewhat portable, but code generation is very specific to the target processor. In some ways this phase is where compilers earn their keep on single-processor RISC systems.

Anything that isn’t handled in hardware has to be addressed in software. That means if the processor can’t resolve resource conflicts, such as overuse of a register or pipeline, then the compiler is going to have to take care of it. Allowing the compiler to take care of it isn’t necessarily a bad thing — it’s a design decision. A complicated compiler and simple, fast hardware might be cost effective for certain applications. Two processors at opposite ends of this spectrum are the MIPS R2000 and the HP PA-8000. The first depends heavily on the compiler to schedule instructions and fairly distribute resources. The second manages both things at runtime, though both depend on the compiler to provide a balanced instruction mix.

In all computers, register selection is a challenge because, in spite of their numbers, registers are precious. You want to be sure that the most active variables become register resident at the expense of others. On machines without register renaming (see (Reference)), you have to be sure that the compiler doesn’t try to recycle registers too quickly, otherwise the processor has to delay computations as it waits for one to be freed.

Some instructions in the repertoire also save your compiler from having to issue others. Examples are auto-increment for registers being used as array indices or conditional assignments in lieu of branches. These both save the processor from extra calculations and make the instruction stream more compact.

Lastly, there are opportunities for increased parallelism. Programmers generally think serially, specifying steps in logical succession. Unfortunately, serial source code makes serial object code. A compiler that hopes to efficiently use the parallelism of the processor will have to be able to move instructions around and find operations that can be issued side by side. This is one of the biggest challenges for compiler writers today. As superscalar and very long instruction word (VLIW) designs become capable of executing more instructions per clock cycle, the compiler will have to dig deeper for operations that can execute at the same time.

## Footnotes

1. This code is an example of a flow dependence. I describe dependencies in detail in (Reference).
2. If a compiler does sufficient interprocedural analysis, it can even optimize variables across routine boundaries. Interprocedural analysis can be the bane of benchmark codes trying to time a computation without using the results of the computation.

## Content actions

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

PDF | EPUB (?)

### What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

#### Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

#### Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

#### Definition of a lens

##### Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

##### What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

##### Who can create a lens?

Any individual member, a community, or a respected organization.

##### What are tags?

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks