+ All Categories
Home > Documents > Chapter One Introduction to Pipelined Processors

Chapter One Introduction to Pipelined Processors

Date post: 24-Feb-2016
Category:
Upload: soo
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Chapter One Introduction to Pipelined Processors. Principle of Designing Pipeline Processors. (Design Problems of Pipeline Processors). Instruction Prefetch and Branch Handling. The instructions in computer programs can be classified into 4 types: Arithmetic/Load Operations (60%) - PowerPoint PPT Presentation
Popular Tags:
37
Chapter One Introduction to Pipelined Processors
Transcript
Page 1: Chapter One  Introduction to Pipelined Processors

Chapter One Introduction to Pipelined

Processors

Page 2: Chapter One  Introduction to Pipelined Processors

Principle of Designing Pipeline Processors

(Design Problems of Pipeline Processors)

Page 3: Chapter One  Introduction to Pipelined Processors

Instruction Prefetch and Branch Handling

• The instructions in computer programs can be classified into 4 types:– Arithmetic/Load Operations (60%) – Store Type Instructions (15%)– Branch Type Instructions (5%)– Conditional Branch Type (Yes – 12% and No – 8%)

Page 4: Chapter One  Introduction to Pipelined Processors

Instruction Prefetch and Branch Handling

• Arithmetic/Load Operations (60%) : – These operations require one or two operand

fetches. – The execution of different operations requires a

different number of pipeline cycles

Page 5: Chapter One  Introduction to Pipelined Processors

Instruction Prefetch and Branch Handling

• Store Type Instructions (15%) :– It requires a memory access to store the data.

• Branch Type Instructions (5%) :– It corresponds to an unconditional jump.

Page 6: Chapter One  Introduction to Pipelined Processors

Instruction Prefetch and Branch Handling

• Conditional Branch Type (Yes – 12% and No – 8%) : – Yes path requires the calculation of the new

address – No path proceeds to next sequential instruction.

Page 7: Chapter One  Introduction to Pipelined Processors

Instruction Prefetch and Branch Handling

• Arithmetic-load and store instructions do not alter the execution order of the program.

• Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers.

Page 8: Chapter One  Introduction to Pipelined Processors

Handling Example – Interrupt System of Cray1

Page 9: Chapter One  Introduction to Pipelined Processors

Cray-1 System• The interrupt system is built around an

exchange package. • When an interrupt occurs, the Cray-1 saves 8

scalar registers, 8 address registers, program counter and monitor flags.

• These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register

Page 10: Chapter One  Introduction to Pipelined Processors

Instruction Prefetch and Branch Handling

• In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor.

Page 11: Chapter One  Introduction to Pipelined Processors

Effect of Branching on Pipeline Performance

• Consider a linear pipeline of 5 stages

Fetch Instruction Decode Fetch

OperandsExecute Store

Results

Page 12: Chapter One  Introduction to Pipelined Processors

Overlapped Execution of Instruction without branching

I1

I2

I3

I4

I5

I6

I7

I8

Page 13: Chapter One  Introduction to Pipelined Processors

I5 is a branch instruction

I1

I2

I3

I4

I5

I6

I7

I8

Page 14: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching on an n-segment instruction pipeline

Page 15: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching

• Consider an instruction cycle with n pipeline clock periods.

• Let – p – probability of conditional branch (20%)– q – probability that a branch is successful (60% of

20%) (12/20=0.6)

Page 16: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching

• Suppose there are m instructions • Then no. of instructions of successful branches

= mxpxq (mx0.2x0.6)• Delay of (n-1)/n is required for each successful

branch to flush pipeline.

Page 17: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching

• Thus, the total instruction cycle required for m instructions =

nnmpqmn

n)1(11

Page 18: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching

• As m becomes large , the average no. of instructions per instruction cycle is given as

= ?

nnmpq

nmn

mLtm )1(1

Page 19: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching

• As m becomes large , the average no. of instructions per instruction cycle is given as

nnmpq

nmn

mLtm )1(1

)1(1

npqn

Page 20: Chapter One  Introduction to Pipelined Processors

Estimation of the effect of branching

• When p =0, the above measure reduces to n, which is ideal.

• In reality, it is always less than n.

Page 21: Chapter One  Introduction to Pipelined Processors

Solution = ?

Page 22: Chapter One  Introduction to Pipelined Processors

Multiple Prefetch Buffers• Three types of buffers can be used to match

the instruction fetch rate to pipeline consumption rate

1. Sequential Buffers: for in-sequence pipelining

2. Target Buffers: instructions from a branch target (for out-of-sequence pipelining)

Page 23: Chapter One  Introduction to Pipelined Processors

Multiple Prefetch Buffers• A conditional branch cause both sequential

and target to fill and based on condition one is selected and other is discarded

Page 24: Chapter One  Introduction to Pipelined Processors

Multiple Prefetch Buffers

3. Loop Buffers– Holds sequential instructions within a loop

Page 25: Chapter One  Introduction to Pipelined Processors

Data Buffering and Busing Structures

Page 26: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments• The processing speed of pipeline segments are

usually unequal.• Consider the example given below:

S1 S2 S3

T1 T2 T3

Page 27: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments• If T1 = T3 = T and T2 = 3T, S2 becomes the

bottleneck and we need to remove it• How?• One method is to subdivide the bottleneck– Two divisions possible are:

Page 28: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments• First Method:

S1

T T 2T

S3

T

Page 29: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments• First Method:

S1

T T 2T

S3

T

Page 30: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments• Second Method:

S1

T T T

S3

T T

Page 31: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments

• If the bottleneck is not sub-divisible, we can duplicate S2 in parallel

S1

S2

S3

T

3T

T

S2

3T

S2

3T

Page 32: Chapter One  Introduction to Pipelined Processors

Speeding up of pipeline segments

• Control and Synchronization is more complex in parallel segments

Page 33: Chapter One  Introduction to Pipelined Processors

Data Buffering• Instruction and data buffering provides a

continuous flow to pipeline units• Example: 4X TI ASC

Page 34: Chapter One  Introduction to Pipelined Processors

Example: 4X TI ASC • In this system it uses a memory buffer unit

(MBU) which– Supply arithmetic unit with a continuous stream of

operands– Store results in memory

• The MBU has three double buffers X, Y and Z (one octet per buffer)– X,Y for input and Z for output

Page 35: Chapter One  Introduction to Pipelined Processors

Example: 4X TI ASC • This provides pipeline processing at high rate

and alleviate mismatch bandwidth problem between memory and arithmetic pipeline

Page 36: Chapter One  Introduction to Pipelined Processors

Busing Structures• PBLM: Ideally subfunctions in pipeline should

be independent, else the pipeline must be halted till dependency is removed.

• SOLN: An efficient internal busing structure.• Example : TI ASC

Page 37: Chapter One  Introduction to Pipelined Processors

Example : TI ASC

• In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.


Recommended