Skip to content Skip to navigation Skip to collection information

OpenStax_CNX

You are here: Home » Content » 2008-'09 Open Education Cup: High Performance Computing » Example - Analysis of Shared Memory Multiprocessors

Navigation

Table of Contents

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Endorsed by Endorsed (What does "Endorsed by" mean?)

This content has been endorsed by the organizations listed. Click each link for a list of all content endorsed by the organization.
  • HPC Open Edu Cup display tagshide tags

    This collection is included inLens: High Performance Computing Open Education Cup 2008-2009
    By: Ken Kennedy Institute for Information Technology

    Click the "HPC Open Edu Cup" link to see all content they endorse.

    Click the tag icon tag icon to display tags associated with this content.

Also in these lenses

  • eScience, eResearch and Computational Problem Solving

    This collection is included inLens: eScience, eResearch and Computational Problem Solving
    By: Jan E. Odegard

    Click the "eScience, eResearch and Computational Problem Solving" link to see all content selected in this lens.

Recently Viewed

This feature requires Javascript to be enabled.

Tags

(What is a tag?)

These tags come from the endorsement, affiliation, and other lenses that include this content.
 

Example - Analysis of Shared Memory Multiprocessors

Module by: Bart Sinclair. E-mail the author

Summary: (Blank Abstract)

A shared-memory multiprocessor consists of a number of processors accessing one or more shared memory modules. The processors can be physically connected to the memory modules in a variety of ways, but logically every processor is connected to every module.

We can use any of several metrics to evaluate the performance of a multiprocessor system. In order to keep the model and its implementation manageable, we will focus on the degree of parallelism among the memory modules; that is, we are interested in determining the average number of memory modules that are being accessed simultaneously. The model is as follows:

  1. The system has pp processors and mm memory modules. Each processor can send requests for access to any memory module.
  2. Memory module accesses are synchronized; two modules servicing access requests at the same time start and complete their accesses together. Memory accesses always take one time unit (one memory cycle).
  3. Processors are infinitely fast. When a processor's access request has been serviced by a memory module, the processor immediately generates a new request.
  4. Processors send requests with equal probability to each module. The module chosen for a new request is independent of the module chosen for any other request.
  5. A processor may have only one outstanding request at a time.
  6. A memory module may service only one request at a time. If more than one request is queued at a memory module at the beginning of a memory cycle, the module selects a request to service at random. All requests not serviced during the cycle remain queued at the module at the beginning of the next cycle.
Let B- B be the average number of memory modules busy servicing an access request during a memory cycle. Regardless of the number of processors and memory modules, we can implement the model as a discrete-time Markov chain in which the information contained in a state includes the number of memory modules that have requests waiting at the beginning of a memory cycle. Hence, a state also describes the number of modules that will be busy during the next memory cycle. We can find B- B by solving for the steady state solution of the Markov chain and computing B-=all states iPrstate inumber of busy modules in state i B all states i state i number of busy modules in state i

We know that the Markov chain has a steady state solution (is ergodic) since the chain will be finite and at least one state will have a self-loop. The chain is finite because the number of ways that processors' access requests can be distributed among the modules is finite. Any state that represents at least one request at each memory module will have a self-loop, since any processor that has its request satisfied during a cycle beginning in that state has a non-zero probability of sending its next access request to the same module.

As usual, we proceed in three steps. We first determine, for a given pp and mm, the set of all states in the Markov chain. Then, we compute the probabilities for all allowed single step state transitions. Finally, we solve the Markov chain for the steady state probabilities.

It is not possible in general to implement the model using states that only tells us how many modules are going to be busy during the next cycle. Consider, for example, a model for a system with 4 processors and 2 memory modules. A state that only tells us that two modules will be busy during the next cycle does not include enough information to tell us the probabilities of being in the various states at the beginning of the next cycle. This is because two modules will be busy during a cycle that starts with requests distributed among the modules in either of two ways:

  1. three requests at one module, one request at the other module
  2. two requests at one module, two requests at the other module

If we start a cycle with the four requests distributed as in (1), at the end of the cycle we have two requests left at one module, no requests at the other module, and two processors ready to make new requests. The next cycle may start with the four requests distributed as in (1) or (2), or with four requests at a single module. If we start a cycle as in (2), both modules have a request waiting at the end of the cycle. The next cycle cannot begin with all four requests at a single module.

We must use a definition of state that allows us to determine how the access requests that must wait during a cycle are distributed among the modules at the end of the cycle, in addition to telling us how many modules will be busy during the cycle. (1) and (2) above illustrate exactly how this is done. Each state specifies how the p requests are distributed among the modules, without regard to which module is which. It is unimportant in (1) to know which module has three requests and which has two, since the modules and processors are identical.

With this general definition of state, computing the state transition probabilities can be divided into two parts. The first is simple: if rp r p requests are serviced during a memory cycle, the rr processors that generate new requests for the next memory cycle choose any particular set of modules with probability 1mr 1 m r . They may all choose to send their new requests to the same module, or each may select a different module from the others, or some may choose the same module while others select different modules. The end result will be some distribution of pp requests among the mm modules at the beginning of the next cycle, with each possible distribution represented by a different state.

The second part of computing state transition probabilities is to count the number of ways in which processors may choose to make new requests that will result in the same distribution of requests at the beginning of the next cycle (the same next state). Consider the following example with 4 processors and 2 memory modules, as above. Suppose the model begins one cycle with two requests at each module. At the end of the cycle, each module has one request remaining. Two processors will make new requests before the start of the next cycle. If they select the same module, the next state has three requests at one module and one request at the other. There are two ways that this can happen, since there are two modules that they can select and each of these modules has one request remaining from the previous cycle.

To put the two parts together, we compute the single step state transition probability for the transition from state A to state B by multiplying the probability of choosing any set of modules following the cycle beginning in state A by the number of ways we can choose a set of modules that takes us to state B. For the example, the probability of making the transition from the state with two requests at each module to the state with three requests at one module and one at the other is 1222=1/2 1 2 2 2 12 .

Generally, the process of computing state transition probabilities is more difficult that this example would indicate, even though the process just described always works. The difficulty will always come in the second part - counting the number of ways of choosing where new requests go that take the model to the same next state. For this reason, we recommend that you follow the above procedure to compute all single step transition probabilities for each state. Because the single step transition probabilities for a given state (including a transition back to the same state) must sum to 1, this will provide a useful check on whether or not you counted all the possibilities.

We illustrate the complete procedure with several examples. Let Cnr C n r be the number of combinations of nn objects taken rr at a time, and Pnr P n r be the number of permutations of nn objects taken rr at a time. In the first two examples, we find an expression for the average memory module concurrency as a function of the number of memory modules for a fixed number (2 or 3) of processors.

Example 1: 2 processors, m≥2 modules

Table 1
State  
1 2 requests at the same module
2 1 request at each of two modules

First, find the single step transition probabilities. At the risk of overkill for this particularly simple case, we explicitly write each probability as the product of the probability that the new requests go to a particular set of modules ( 1m 1 m if the cycle started in state 1 and 1m2 1 m 2 if the cycle started in state 2) and the number of such sets that result in the specified next state. p1,1=1m1=1m p 1 1 1 m 1 1 m p1,2=1m(m1)=m1m p 1 2 1 m m 1 m 1 m p2,1=1m2Cm1=1m p 2 1 1 m 2 C m 1 1 m p2,2=1m2Cm2=m1m p 2 2 1 m 2 C m 2 m 1 m This may actually seem to be double overkill. Since we know that the PP matrix is singular, we only need one of the Chapman-Kolmogorov equations, and hence only two of the single step transition probabilities - either p1,1 p 1 1 and p2,1 p 2 1 or p1,2 p 1 2 and p2,2 p 2 2 . The other independent equation is the normalization equation. However, as mentioned above, if you compute all of the single-step state transition probabilities, you can add the probabilities for all transitions from each state as a check on having gotten them correct.

Continuing with the example, choose the first Chapman-Kolmogorov equation. Then π1=π1p1,1+π2p2,1=π11m+π21m π 1 π 1 p 1 1 π 2 p 2 1 π 1 1 m π 2 1 m π2=(m1)π1 π 2 m 1 π 1 (π1+π2=1=mπ1)(π1=1m)(π2=m1m) π 1 π 2 1 m π 1 π 1 1 m π 2 m 1 m The average memory module concurrency or parallelism is B-=1×1m+2m1m=2m1m B 1 1 m 2 m 1 m 2 m 1 m

Example 2: 3 processors, m≥3 modules

Table 2
State  
1 3 requests at one module
2 2 requests at one module, 1 request at another module
3 1 request at each of three modules

p1,1=1m1=1m p 1 1 1 m 1 1 m p1,2=1mCm11=1m(m1) p 1 2 1 m C m 1 1 1 m m 1 p1,3=0 p 1 3 0 p2,1=1m21=1m2 p 2 1 1 m 2 1 1 m 2 p2,2=1m2(Cm11+Cm11C21)=1m23(m1) p 2 2 1 m 2 C m 1 1 C m 1 1 C 2 1 1 m 2 3 m 1 p2,3=1m2Cm12P22=1m2(m1)(m2) p 2 3 1 m 2 C m 1 2 P 2 2 1 m 2 m 1 m 2 p3,1=1m3Cm1=1m3m p 3 1 1 m 3 C m 1 1 m 3 m p3,2=1m3Cm2C32P22=1m33m(m1) p 3 2 1 m 3 C m 2 C 3 2 P 2 2 1 m 3 3 m m 1 p3,3=1m3Cm3P33=1m3m(m1)(m2) p 3 3 1 m 3 C m 3 P 3 3 1 m 3 m m 1 m 2 p2,2 p 2 2 merits an explanation. Two memory modules are busy during a cycle that starts in state 2. At the end of the cycle, the two processors that had their memory access requests serviced will make new requests. The first term inside the square brackets corresponds to the two new requests going to the same memory module, one of the m1 m 1 modules without a request at the end of the cycle. There are Cm11 C m 1 1 ways to select this module. The second term corresponds to one of the new requests going to the module that still has a request pending, and the other new request going to one of the other m1 m 1 modules. There are Cm11 C m 1 1 ways to choose the module without a pending request, and there are two ways to order the two new requests so that one goes to the module that already has a request and the other goes to the module that doesn't.

We will make use of the Chapman-Kolmogorov equations for π1 π 1 and π3 π 3 , plus the normalization equation. π1=π11m+π21m2+π31m2 π 1 π 1 1 m π 2 1 m 2 π 3 1 m 2 π3=π2(m1)(m2)m2+π3(m1)(m2)m2 π 3 π 2 m 1 m 2 m 2 π 3 m 1 m 2 m 2 1=π1+π2+π3 1 π 1 π 2 π 3 Solving these three equations gives π1=1m2m+1 π 1 1 m 2 m 1 π2=(3m2)(m1)m(m2m+1) π 2 3 m 2 m 1 m m 2 m 1 π3=m12(m2)m(m2m+1) π 3 m 1 2 m 2 m m 2 m 1 B-=1×1m2m+1+2(3m2)(m1)m(m2m+1)+3m12(m2)m(m2m+1)=3m36m2+6m2m(m2m+1) B 1 1 m 2 m 1 2 3 m 2 m 1 m m 2 m 1 3 m 1 2 m 2 m m 2 m 1 3 m 3 6 m 2 6 m 2 m m 2 m 1

Example 3: 4 processors, 4 modules

Table 3
State  
1 all four requests are at one module
2a three requests are at one module and one request is at another module
2b two requests are at one module and two requests are at another module
3 two requests are at one module and one request is at each of two other modules
4 one request is at each of four modules

We've chosen the state names to make explicit the number of memory modules busy in each state. The state transition probabilities are p1,1=14 p 1 1 1 4 p1,2a=14C31=34 p 1 2a 1 4 C 3 1 3 4 p1,2b=p1,3=p1,4=0 p 1 2b p 1 3 p 1 4 0 p2a,1=142 p 2a 1 1 4 2 p2a,2a=142C31P22=616 p 2a 2a 1 4 2 C 3 1 P 2 2 6 16 p2a,2b=142C31=316 p 2a 2b 1 4 2 C 3 1 3 16 p2a,3=142C32P22=616 p 2a 3 1 4 2 C 3 2 P 2 2 6 16 p2a,4=0 p 2a 4 0 p2b,1=0 p 2b 1 0 p2b,2a=142C21=216 p 2b 2a 1 4 2 C 2 1 2 16 p2b,2b=142P22=216 p 2b 2b 1 4 2 P 2 2 2 16 p2b,3=142C21+142C21C21P22=1016 p 2b 3 1 4 2 C 2 1 1 4 2 C 2 1 C 2 1 P 2 2 10 16 p2b,4=142P22=216 p 2b 4 1 4 2 P 2 2 2 16 p3,1=143=164 p 3 1 1 4 3 1 64 p3,2a=143C31+143C31C32=1264 p 3 2a 1 4 3 C 3 1 1 4 3 C 3 1 C 3 2 12 64 p3,2b=143C31C32=964 p 3 2b 1 4 3 C 3 1 C 3 2 9 64 p3,3=143C32P33+143C32C32P22=3664 p 3 3 1 4 3 C 3 2 P 3 3 1 4 3 C 3 2 C 3 2 P 2 2 36 64 p3,4=143P33=664 p 3 4 1 4 3 P 3 3 6 64 p4,1=144C41=4256 p 4 1 1 4 4 C 4 1 4 256 p4,2a=144C42C43P22=48256 p 4 2a 1 4 4 C 4 2 C 4 3 P 2 2 48 256 p4,2b=144C42C42=36256 p 4 2b 1 4 4 C 4 2 C 4 2 36 256 p4,3=144C43C31C42P22=144256 p 4 3 1 4 4 C 4 3 C 3 1 C 4 2 P 2 2 144 256 p4,4=144P44=24256 p 4 4 1 4 4 P 4 4 24 256 Combining all of these probabilities into the single-step transition probability matrix: P=( 1/43/4000 1/166/163/166/160 02/162/1610/162/16 1/6412/649/6436/646/64 4/25648/25636/256144/25624/256 ) P 14 34 0 0 0 116 616 316 616 0 0 216 216 1016 216 164 1264 964 3664 664 4256 48256 36256 144256 24256 Solve πP=π π P π plus the normalization equation.

The solution is π1π2aπ2bπ3π4=0.03230.24190.14520.50810.0726 π 1 π 2a π 2b π 3 π 4 0.0323 0.2419 0.1452 0.5081 0.0726 The average memory module concurrency is B-=π11+(π2a+π2b)2+π33+π44=2.2610 B π 1 1 π 2a π 2b 2 π 3 3 π 4 4 2.2610

The three examples allow us to compare the average memory module concurrency with four modules for two, three, and four processors:

Table 4
p B- B
2 1.750
3 2.269
4 2.261

In going from 2 to 3 processors, we see about a 30% increase in memory module concurrency. When we go from 3 processors to 4, the performance improves only about 15.5%.

Collection Navigation

Content actions

Download:

Collection as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add:

Collection to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks

Module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks