Several factors serve to limit the pipeline performance. If the six stage are not of equal duration, there will be some waiting involved at various pipeline stage. Another difficulty is the condition branch instruction or the unpredictable event is an interrupt. Other problem arise that the memory conflicts could occur. So the system must contain logic to account for the type of conflict.
- Data dependencies also factor into the effective length of pipelines
- Logic to handle memory and register use and to control the overall pipeline increases significantly with increasing pipeline depth
– If the speedup is based on the number of stages, why not build lots of stages?
– Each stage uses latches at its input (output) to buffer the next set of inputs
+ If the stage granularity is reduced too much, the latches and their control become a significant hardware overhead
+ Also suffer a time overhead in the propagation time through the latches
- Limits the rate at which data can be clocked through the pipeline
– Pipelining must insure that computed results are the same as if computation was performed in strict sequential order
– With multiple stages, two instructions “in execution” in the pipeline may have data dependencies. So we must design the pipeline to prevent this.
– Data dependency examples:
A = B + C
D = E + A
C = G x H
A = D / H
Data dependencies limit when an instruction can be input to the pipeline.
One of the major problems in designing an instruction pipeline is assuring a steady flow of instructions to initial stages of the pipeline. However, 15-20% of instructions in an assembly-level stream are (conditional) branches. Of these, 60-70% take the branch to a target address. Until the instruction is actually executed, it is impossible to determin whether the branch will be taken or not.
- Impact of the branch is that pipeline never really operates at its full capacity.
– The average time to complete a pipelined instruction becomes
Tave =(1-pb)1 + pb[pt(1+b) + (1-pt)1]
– A number of techniques can be used to minimize the impact of the branch instruction (the branch penalty).
- A several approaches have been taken for dealing with conditional branches:
+ Multiple streams
+ Prefetch branch target
+ Loop buffer
+ Branch prediction
+Delayed branch.
- Replicate the initial portions of the pipeline and fetch both possible next instructions
- Increases chance of memory contention
- Must support multiple streams for each instruction in the pipeline
- When the branch instruction is decoded, begin to fetch the branch target instruction and place in a second prefetch buffer
- If the branch is not taken, the sequential instructions are already in the pipe, so there is not loss of performance
- If the branch is taken, the next instruction has been prefetched and results in minimal branch penalty (don’t have to incur a memory read operation at the end of the branch to fetch the instruction)
- Loop buffer: Look ahead, look behind buffer
- Many conditional branches operations are used for loop control
- Expand prefetch buffer so as to buffer the last few instructions executed in addition to the ones that are waiting to be executed
- If buffer is big enough, entire loop can be held in it, this can reduce the branch penalty.
- Make a good guess as to which instruction will be executed next and start that one down the pipeline.
- Static guesses: make the guess without considering the runtime history of the program
Branch never taken
Branch always taken
Predict based on the opcode
- Dynamic guesses: track the history of conditional branches in the program.
Taken / not taken switch History table
Figure 8.3. Branch prediction using 2 history bits
- Minimize the branch penalty by finding valid instructions to execute in the pipeline while the branch address is being resolved.
- It is possible to improve performance by automatically rearranging instruction within a program, so that branch instruction occur later than actually desired
- Compiler is tasked with reordering the instruction sequence to find enough independent instructions (wrt to the conditional branch) to feed into the pipeline after the branch that the branch penalty is reduced to zero