Skip to content Skip to navigation

OpenStax_CNX

You are here: Home » Content » Language Support for Performance - Data-Parallel Problem: Heat Flow

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Endorsed by Endorsed (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
  • HPC Open Edu Cup

    This module is included inLens: High Performance Computing Open Education Cup 2008-2009
    By: Ken Kennedy Institute for Information TechnologyAs a part of collection: "High Performance Computing"

    Click the "HPC Open Edu Cup" link to see all content they endorse.

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • NSF Partnership display tagshide tags

    This module is included inLens: NSF Partnership in Signal Processing
    By: Sidney BurrusAs a part of collection: "High Performance Computing"

    Click the "NSF Partnership" link to see all content affiliated with them.

    Click the tag icon tag icon to display tags associated with this content.

  • Featured Content

    This module is included inLens: Connexions Featured Content
    By: ConnexionsAs a part of collection: "High Performance Computing"

    Comments:

    "The purpose of Chuck Severence's book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners […]"

    Click the "Featured Content" link to see all content affiliated with them.

Also in these lenses

  • UniqU content

    This module is included inLens: UniqU's lens
    By: UniqU, LLCAs a part of collection: "High Performance Computing"

    Click the "UniqU content" link to see all content selected in this lens.

  • Lens for Engineering

    This module is included inLens: Lens for Engineering
    By: Sidney Burrus

    Click the "Lens for Engineering" link to see all content selected in this lens.

  • eScience, eResearch and Computational Problem Solving

    This module is included inLens: eScience, eResearch and Computational Problem Solving
    By: Jan E. OdegardAs a part of collection: "High Performance Computing"

    Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Language Support for Performance - Data-Parallel Problem: Heat Flow

Module by: Charles Severance, Kevin Dowd. E-mail the authors

A classic problem that explores scalable parallel processing is the heat flow problem. The physics behind this problem lie in partial differential equations.

We will start with a one-dimensional metal plate (also known as a rod), and move to a two-dimensional plate in later examples. We start with a rod that is at zero degrees celsius. Then we place one end in 100 degree steam and the other end in zero degree ice. We want to simulate how the heat flows from one end to another. And the resulting temperatures along points on the metal rod after the temperature has stabilized.

To do this we break the rod into 10 segments and track the temperature over time for each segment. Intuitively, within a time step, the next temperature of a portion of the plate is an average of the surrounding temperatures. Given fixed temperatures at some points in the rod, the temperatures eventually converge to a steady state after sufficient time steps. Figure 1 shows the setup at the beginning of the simulation.

Figure 1
Heat flow in a rod
This figure is a flow chart showing boxes starting with 100 and followed by a string of boxes labeled 0. The boxes are also numbered from 1 to 10. There is an arrow pointing to the right, labeled, heat will flow, and below the fourth, fifth, and sixth boxes is the label, average.

A simplistic implementation of this is as follows:


PROGRAM HEATROD PARAMETER(MAXTIME=200) INTEGER TICKS,I,MAXTIME REAL*4 ROD(10) ROD(1) = 100.0 DO I=2,9 ROD(I) = 0.0 ENDDO ROD(10) = 0.0 DO TICKS=1,MAXTIME IF ( MOD(TICKS,20) .EQ. 1 ) PRINT 100,TICKS,(ROD(I),I=1,10) DO I=2,9 ROD(I) = (ROD(I-1) + ROD(I+1) ) / 2 ENDDO ENDDO 100 FORMAT(I4,10F7.2) END

The output of this program is as follows:


% f77 heatrod.f heatrod.f: MAIN heatrod: % a.out 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21 100.00 87.04 74.52 62.54 51.15 40.30 29.91 19.83 9.92 0.00 41 100.00 88.74 77.51 66.32 55.19 44.10 33.05 22.02 11.01 0.00 61 100.00 88.88 77.76 66.64 55.53 44.42 33.31 22.21 11.10 0.00 81 100.00 88.89 77.78 66.66 55.55 44.44 33.33 22.22 11.11 0.00 101 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 121 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 141 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 161 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 181 100.00 88.89 77.78 66.67 55.56 44.44 33.33 22.22 11.11 0.00 %

Clearly, by Time step 101, the simulation has converged to two decimal places of accuracy as the numbers have stopped changing. This should be the steady-state approximation of the temperature at the center of each segment of the bar.

Now, at this point, astute readers are saying to themselves, "Um, don't look now, but that loop has a flow dependency." You would also claim that this won't even parallelize a little bit. It is so bad you can't even unroll the loop for a little instruction-level parallelism!

A person familiar with the theory of heat flow will also point out that the above loop doesn't exactly implement the heat flow model. The problem is that the values on the right side of the assignment in the ROD loop are supposed to be from the previous time step, and that the value on the left side is the next time step. Because of the way the loop is written, the ROD(I-1) value is from the next time step, as shown in Paragraph 10.

This can be solved using a technique called red-black, where we alternate between two arrays. Figure 3 shows how the red-black version of the computation operates. This kills two birds with one stone! Now the mathematics is precisely correct, and there is no recurrence. Sounds like a real win-win situation.

Figure 2
Computing the new value for a cell
This figure is a flow chart showing boxes starting with 100 and followed by a string of boxes labeled 50, 25, 12.5, then six boxes labeled 0. The boxes are also numbered from 1 to 10. Below the fourth, fifth, and sixth boxes is the label, average.

Figure 3
Using two arrays to eliminate a dependency
This figure is a flow chart showing boxes starting with 100 and followed by a string of boxes labeled  0. The boxes are also numbered from old 1 to old 10. Below the fourth, fifth, and sixth boxes is the label, average. This average is encompassed by a large arrow that points down to another flow chart showing boxes starting with 100 and 50 and followed by a string of boxes labeled  0. The boxes are also numbered from fixed 1 to fixed 10.

The only downside to this approach is that it takes twice the memory storage and twice the memory bandwidth.1 The modified code is as follows:


PROGRAM HEATRED PARAMETER(MAXTIME=200) INTEGER TICKS,I,MAXTIME REAL*4 RED(10),BLACK(10) RED(1) = 100.0 BLACK(1) = 100.0 DO I=2,9 RED(I) = 0.0 ENDDO RED(10) = 0.0 BLACK(10) = 0.0 DO TICKS=1,MAXTIME,2 IF ( MOD(TICKS,20) .EQ. 1 ) PRINT 100,TICKS,(RED(I),I=1,10) DO I=2,9 BLACK(I) = (RED(I-1) + RED(I+1) ) / 2 ENDDO DO I=2,9 RED(I) = (BLACK(I-1) + BLACK(I+1) ) / 2 ENDDO ENDDO 100 FORMAT(I4,10F7.2) END

The output for the modified program is:


% f77 heatred.f heatred.f: MAIN heatred: % a.out 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21 100.00 82.38 66.34 50.30 38.18 26.06 18.20 10.35 5.18 0.00 41 100.00 87.04 74.52 61.99 50.56 39.13 28.94 18.75 9.38 0.00 61 100.00 88.36 76.84 65.32 54.12 42.91 32.07 21.22 10.61 0.00 81 100.00 88.74 77.51 66.28 55.14 44.00 32.97 21.93 10.97 0.00 101 100.00 88.84 77.70 66.55 55.44 44.32 33.23 22.14 11.07 0.00 121 100.00 88.88 77.76 66.63 55.52 44.41 33.30 22.20 11.10 0.00 141 100.00 88.89 77.77 66.66 55.55 44.43 33.32 22.22 11.11 0.00 161 100.00 88.89 77.78 66.66 55.55 44.44 33.33 22.22 11.11 0.00 181 100.00 88.89 77.78 66.67 55.55 44.44 33.33 22.22 11.11 0.00 %

Interestingly, the modified program takes longer to converge than the first version. It converges at Time step 181 rather than 101. If you look at the first version, because of the recurrence, the heat ended up flowing up faster from left to right because the left element of each average was the next-time-step value. It may seem nifty, but it's wrong.2 Generally, in this problem, either approach converges to the same eventual values within the limits of floating-point representation.

This heat flow problem is extremely simple, and in its red-black form, it's inherently very parallel with very simple data interactions. It's a good model for a wide range of problems where we are discretizing two-dimensional or three-dimensional space and performing some simple simulations in that space.

This problem can usually be scaled up by making a finer grid. Often, the benefit of scalable processors is to allow a finer grid rather than a faster time to solution. For example, you might be able to to a worldwide weather simulation using a 200-mile grid in four hours on one processor. Using 100 processors, you may be able to do the simulation using a 20-mile grid in four hours with much more accurate results. Or, using 400 processors, you can do the finer grid simulation in one hour.

Footnotes

  1. There is another red-black approach that computes first the even elements and then the odd elements of the rod in two passes. This approach has no data dependencies within each pass. The ROD array never has all the values from the same time step. Either the odd or even values are one time step ahead of the other. It ends up with a stride of two and doubles the bandwidth but does not double the memory storage required to solve the problem.
  2. There are other algorithmic approaches to solving partial differential equations, such as the "fast multipole method" that accelerates convergence "legally." Don't assume that the brute force approach used here is the only method to solve this particular problem. Programmers should always look for the best available algorithm (parallel or not) before trying to scale up the "wrong" algorithm. For folks other than computer scientists, time to solution is more important than linear speed-up.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks