+ All Categories
Home > Technology > Lec Jan15 2009

Lec Jan15 2009

Date post: 05-Dec-2014
Category:
Upload: ravi-soni
View: 858 times
Download: 0 times
Share this document with a friend
Description:
 
32
Anshul Kumar, CSE IITD CSL718 : Pipelined Processors CSL718 CSL718 : Pipelined Processors : Pipelined Processors PipelineTimings 15th Jan, 2009
Transcript
Page 1: Lec Jan15 2009

Anshul Kumar, CSE IITD

CSL718 : Pipelined ProcessorsCSL718CSL718 : Pipelined Processors: Pipelined Processors

PipelineTimings15th Jan, 2009

Page 2: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 2

Pipelined ProcessorsPipelined ProcessorsPipelined Processors

Function-parallel

Instr level (ILP) Thread level Process level

Pipelined processors

VLIWs Superscalar processors

Parallel architectures

Data-parallel

Intel’s terminology:• intra ILP

• inter ILP

Page 3: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 3

Ideal PipeliningIdeal PipeliningIdeal Pipelining

TinstS stages

Page 4: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 4

Determining Clock PeriodDetermining Clock PeriodDetermining Clock Period

Clock

Δt

CombReg Reg

Δt ≥

PP = propagation delay

Δt = Pmax

Pmax = max propagation delay

P

Page 5: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 5

Ideal PipeliningIdeal PipeliningIdeal Pipelining

Δt = Tinst / S Effective CPI = 1Effective time per inst Teff = CPI * Δt

= 1 * Tinst / S

TinstS stages

Pmax = Tinst / S

Page 6: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 6

Pipelining with hazardsPipelining with hazardsPipelining with hazards

Δt = Tinst / SCPI = 1 + (S - 1) * bTeff = (1 + (S - 1) * b) * Tinst / S

TinstS stages

Frequency of interruptions - b

Page 7: Lec Jan15 2009

Teff vs. S (Tinst = 10)

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10S

Teff b = .2

b = .1

b = .05

Page 8: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 8

A more realistic viewA more realistic viewA more realistic view

Clock

CombReg Reg

P

Register output delay Register setup time

Clock skew

Page 9: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 9

Clocking OverheadClocking OverheadClocking Overhead

• Fixed overhead c– Setup time – Output delay

• Variable overhead (stretching factor) k

– Clock skew

Δt = Pmax + k * Pmax + c= (1 + k) * Tinst / S + c

Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c]

Page 10: Lec Jan15 2009

Teff vs. S (Tinst = 10, c = 1, k = .1)

0

2

4

68

10

12

14

1 3 5 7 9 11 13 15S

Teff b = .2

b = .1

b = .05

Page 11: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 11

Pipelining with Clocking OverheadPipelining with Clocking OverheadPipelining with Clocking Overhead

Teff = [1 + (S - 1) * b] * [(1 + k) * Tinst / S + c]

Sopt = √

[(1 - b) * (1 + k) * Tinst / (b * c)]

Page 12: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 12

Partitioning instruction into cycles with non-uniform stage times

Partitioning instruction into cycles Partitioning instruction into cycles with nonwith non--uniform stage timesuniform stage times

One action - one pipeline stage => large quantization overhead

Multiple actions per stage?Multiple stages per action?

Page 13: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 13

ExampleExampleExample Put Away 2 ns

Data - ALU 3 ns

Addr - MAR 3 ns

Data - IR 3 ns

PC - MAR 4 ns

Cache Dir 6 ns

Cache Dir 6 ns

Cache Data 10 ns

Decode 6+6 ns

Gen Addr 9ns

Cache Data 10 ns

Execute 7+7+8 ns

Page 14: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 14

Optimal PipeliningOptimal PipeliningOptimal Pipelining

Tinst = 4+6+10+3+12+9+3+6+10+3+22+2 = 90 ns

b = 0.2 c = 4 ns k = 5%

Sopt = √

[(1 - b) * (1 + k) * Tinst / (b * c)]= 9.7 ⇒ 9

Pmax = 10 ns

Page 15: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 15

ExampleExampleExample Put Away 2 ns

Data - ALU 3 ns

Addr - MAR 3 ns

Data - IR 3 ns

PC - MAR 4 ns

Cache Dir 6 ns

Cache Dir 6 ns

Cache Data 10 ns

Decode 6+6 ns

Gen Addr 9ns

Cache Data 10 ns

Execute 7+7+8 ns

Pmax = 10 ns

S = 10Δt = 14.5 nsS * Δt = 145 ns

Page 16: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 16

ExampleExampleExample Put Away 2 ns

Data - ALU 3 ns

Addr - MAR 3 ns

Data - IR 3 ns

PC - MAR 4 ns

Cache Dir 6 ns

Cache Dir 6 ns

Cache Data 10 ns

Decode 6+6 ns

Gen Addr 9ns

Cache Data 10 ns

Execute 7+7+8 ns

S = 9

Pmax = 13 nsΔt = 17.65 nsS * Δt = 159 ns

Page 17: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 17

ExampleExampleExample Put Away 2 ns

Data - ALU 3 ns

Addr - MAR 3 ns

Data - IR 3 ns

PC - MAR 4 ns

Cache Dir 6 ns

Cache Dir 6 ns

Cache Data 10 ns

Decode 6+6 ns

Gen Addr 9ns

Cache Data 10 ns

Execute 7+7+8 ns

Pmax = 20 ns

S = 5Δt = 25 nsS * Δt = 125 ns

Page 18: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 18

ComparisonComparisonComparison

S Pmax Δt S * Δt Teff

9 13 17.65 159 45.89

10 10 14.50 145 40.60

5 20 25.00 125 45.00

Page 19: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 19

Cycle QuantizationCycle QuantizationCycle Quantization

Delays are not integral multiple of clock periodTotal overhead = clocking overhead

+ quantization overheadΔt ≥

Tinst / S + c (ignoring k)

∴ S * Δt ≥

Tinst + S * cQuantization overhead = S * (Δt - c) -Tinst

This reduces as clock period becomes small

Page 20: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 20

Other Timing ApproachesOther Timing ApproachesOther Timing Approaches

• Self Timed Circuits– No centralized free running clock– An operation begins as soon as its inputs are

available, that is, all its predecessors have completed

– Higher speed, lower power consumption• Wave Pipelining

– Omit inter-stage registers– Reduced clocking overhead

Page 21: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 21

Conventional vs Wave PipeliningConventional Conventional vsvs Wave PipeliningWave Pipelining

Conventional Pipeline• Registers separate

adjoining stages• Clock period > max prop

delay• Inter-stage data stored in

registers

Wave Pipeline• No registers between

adjoining stages• Clock period less than

max prop delay• Waves of data propagate

through combinational network (effectively, data is stored in the combinational circuit delay!)

Page 22: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 22

No pipeliningNo pipeliningNo pipeliningX

Clock

Reg Reg

X

X’ Y

X’Y

Page 23: Lec Jan15 2009

Conventional pipeliningConventional pipeliningConventional pipeliningX

Clock

Reg Reg

X

X’ Y Y’ Z Z’ W

X’Y

Y’Z

Z’W

Page 24: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 24

Wave pipeliningWave pipeliningWave pipeliningX

Clock

Reg Reg

X

Z’ W

Z’W

Page 25: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 25

TimingTimingTiming

Comb cktX Y

Clock

Reg Reg

X

Y

ppropagation delay

sset-up time

T ≥

p + sTclock period

Page 26: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 26

Timing with clock skewTiming with clock skewTiming with clock skew

Comb cktX Y

Clock

Reg Reg

X

Y

p s

T

T ≥

p + s + 2δδ δ

Clock skew = ±δ

Page 27: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 27

Variation in propagation delayVariation in propagation delayVariation in propagation delay

• Different delays in different paths • Delay variation due to process /

temperature/ power variations• Data-dependent delay variations

Page 28: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 28

Timing for wave pipeliningTiming for wave pipeliningTiming for wave pipelining

Comb cktX Y

Clock

Reg Reg

X

Y

T ≥ Δ p + s + 4δ

±δ

pmin

pmax

Δp

T

Page 29: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 29

Timing for wave pipelining (expanded view)

Timing for wave pipeliningTiming for wave pipelining (expanded view)(expanded view)

pmin ≥

(n-1) T + 2δnT ≥

pmax + s + 2δ

⇒ T ≥ Δ p + s + 4δ

Δp

T

X

Y

(n-1) T nTpmin pmax

Page 30: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 30

ComparisonComparisonComparison

Conventional PipelineT ≥

pmax/n + s + 2δ

(plus cycle quantizationoverhead)

nT ≥

pmax + ns + 2nδ

Wave PipelineT ≥ Δ p + s + 4δ

nT ≥

pmax + s + 2δ

Page 31: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 31

Problems with wave pipeliningProblems with wave pipeliningProblems with wave pipelining

• Need to balance delays• Narrow range of clock frequencies• Control difficult• Not very suitable for non-linear pipelines

Page 32: Lec Jan15 2009

Anshul Kumar, CSE IITD slide 32

ReferencesReferencesReferences1. M.J. Flynn, "Computer Architecture : Pipelined and Parallel

Processor Design", Narosa Publishing House/ Jones and Bartlett, 1996.

2. Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474.


Recommended