1
Recap: Lectures 5 & 6Recap: Lectures 5 & 6Classic Pipeline StylesClassic Pipeline Styles
1.1. Williams and Horowitz’s Williams and Horowitz’s PS0 PS0 pipelinepipeline
2.2. Sutherland’s Sutherland’s micropipelinesmicropipelines
2
Different Points in the Design Different Points in the Design SpaceSpaceWilliams/Horowitz’s Williams/Horowitz’s PS0:PS0:
Dual-railDual-rail Data-dependent Data-dependent
completioncompletion Dynamic logicDynamic logic No extra latchesNo extra latches ““Zero-overhead” latencyZero-overhead” latency 4-phase handshakes: 4-phase handshakes:
resetting overheadresetting overhead
Sutherland’s Sutherland’s micropipelines:micropipelines: Single-railSingle-rail Worst case matched Worst case matched
delaydelay Statuc logicStatuc logic Explicit latchesExplicit latches Latch latencies = Latch latencies =
overheadoverhead Elegant transition Elegant transition
signalingsignaling
3Precharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsComplete cycle: Complete cycle: 6 events6 events
indicates “done”indicates “done”
PRECHARGE N:PRECHARGE N: when N+1 completes evaluationwhen N+1 completes evaluationdelete data:delete data: afterafter next stage has copied it next stage has copied it
EVALUATE N:EVALUATE N: when N+1 completes prechargingwhen N+1 completes prechargingaccept new data: accept new data: after after next stage is emptiednext stage is emptied
PS0 ProtocolPS0 Protocol
11 22 33
44
55
66
evaluatesevaluates evaluatesevaluates evaluatesevaluates
indicates “done”indicates “done”
prechargesprecharges
indicates “done”indicates “done”33
Evaluate Evaluate Precharge: Precharge: 3 events3 events
NN N+1N+1 N+2N+2
4
PS0 PerformancePS0 Performance
TEVAL Evaluation TimeTPRECH Precharge TimeTDETECT Completion Detection Time
11 22 33
44
55
66
DETECTPRECHEVAL TTT 23Cycle Time =Cycle Time =
5
Drawbacks of PSO PipeliningDrawbacks of PSO Pipelining1.1. Poor throughput:Poor throughput:
long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time
2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one
spacerspacer
Our Research Goals: Our Research Goals: address both issuesaddress both issues still maintain very low latencystill maintain very low latency
6
Lecture 7: Lecture 7: Recent ApproachesRecent Approaches
7
Recent ApproachesRecent Approaches3 novel styles for high-speed async pipelining:3 novel styles for high-speed async pipelining:
““Lookahead Pipelines”Lookahead Pipelines” (LP) (LP) [Singh/Nowick, Async-00][Singh/Nowick, Async-00] ““High-Capacity Pipelines”High-Capacity Pipelines” (HC) (HC) [Singh/Nowick, [Singh/Nowick,
WVLSI-00]WVLSI-00] MOUSETRAP Pipelines MOUSETRAP Pipelines [Singh/Nowick, TAU-00][Singh/Nowick, TAU-00]
Goal:Goal: significantly improve throughput of PS0significantly improve throughput of PS0Two Distinct Strategies:Two Distinct Strategies:
LP: LP: introduceintroduce protocol optimizations protocol optimizations““shave off”shave off” components from critical cycle components from critical cycle
HC: HC: fundamentally new protocolfundamentally new protocolgreater concurrency: “loosely-coupled” stagesgreater concurrency: “loosely-coupled” stages
8
OutlineOutline New Asynchronous Pipelines: New Asynchronous Pipelines:
LLookahead ookahead PPipelines (LP)ipelines (LP) HHigh-igh-CCapacity Pipelines (HC)apacity Pipelines (HC) MOUSETRAP PipelinesMOUSETRAP Pipelines
Dynamic circuit styleDynamic circuit style
Static circuit styleStatic circuit style
9
Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #1#1Use non-neighbor communication:Use non-neighbor communication:
stage receives information stage receives information from from multiple later multiple later stagesstages
allows allows “early evaluation” “early evaluation”
Benefit:Benefit: stage gets stage gets head-starthead-start on next on next cyclecycle
10
Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #2#2Use early completion detection:Use early completion detection:
completion detector completion detector moved before stagemoved before stage (not after) (not after) stage indicatesstage indicates “early done”“early done” in parallel with in parallel with
computationcomputation
Benefit:Benefit: again, stage gets again, stage gets head-starthead-start on on next cyclenext cycle
early completion detectorearly completion detector
11
Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:
““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”
““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done” LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”
12
Optimization = Optimization = “early evaluation”“early evaluation” each stage has two control inputs: from stages N+1 and N+2each stage has two control inputs: from stages N+1 and N+2
Idea: Idea: shorten precharge phaseshorten precharge phase terminate precharge terminate precharge early:early: when N+2 is done evaluating when N+2 is done evaluating
Dual-Rail Design #1: Dual-Rail Design #1: LP3/1LP3/1
Datain Data
out
PCPC EvalEval
From N+2From N+2
NN N+1N+1 N+2N+2Processing
BlockCompletion
Detector
13
LP3/1 ProtocolLP3/1 Protocol PRECHARGEPRECHARGE N:N: when N+1 completes when N+1 completes
evaluationevaluation EVALUATEEVALUATE N:N: whenwhen N+2N+2 completes completes
evaluationevaluationNew!New!
11 22 33
Enables “early evaluation!”Enables “early evaluation!”
44
N evaluatesN evaluates N+1 evaluatesN+1 evaluates
N+2 indicates “done”N+2 indicates “done”
N+2 evaluatesN+2 evaluates
NN N+1N+1 N+2N+2
N+1 indicates “done”N+1 indicates “done”33
14
PS0PS0
LP3/1LP3/1
LP3/1: Comparison with PS0LP3/1: Comparison with PS0
55
44
4466
NN N+1N+1 N+2N+2
NN N+1N+1 N+2N+2
Enables “early evaluation!”Enables “early evaluation!”
11
11
evaluatesevaluates
evaluatesevaluates22
22
evaluatesevaluates
evaluatesevaluates33
33evaluatesevaluates
evaluatesevaluatesOnly 4 events in cycle!Only 4 events in cycle!
6 events in cycle6 events in cycle
PRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluation
33indicates “done”indicates “done”
indicates “done”indicates “done”33
EVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluation
EVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes precharging
15
11 22 33
44
LP3/1 PerformanceLP3/1 Performance
DETECTEVAL TT 3Cycle Time =Cycle Time =
saved pathsaved path
Savings over PS0:Savings over PS0: 1 Precharge + 1 Completion Detection1 Precharge + 1 Completion Detection
16
LP3/1: Inside a StageLP3/1: Inside a Stage
Precharge Precharge whenwhen PC=1PC=1(and Eval=0)(and Eval=0)
Evaluate Evaluate “early”“early” whenwhen Eval=1Eval=1(or PC=0)(or PC=0)
PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)
NANDNAND
A NAND gate mergesA NAND gate merges2 control inputs:2 control inputs:
Problem: Problem: “early”“early” Eval=1Eval=1 is non- is non-persistent!persistent!
may be de-asserted may be de-asserted beforebefore stage completes stage completes evaluation!evaluation!
Merging 2 Control Inputs:Merging 2 Control Inputs:
““early Eval”early Eval”
““old Eval”old Eval”
17
LP3/1 Timing Constraints: LP3/1 Timing Constraints: ExampleExample
Observation:Observation: PC=0PC=0 soon aftersoon after Eval=1, Eval=1, and is persistentand is persistentSolution:Solution: no change!no change!
use PC as safeuse PC as safe “takeover”“takeover” for Eval!for Eval!Timing Constraint:Timing Constraint: PC=0PC=0 must arrivemust arrive beforebefore Eval de-assertedEval de-asserted
simple one-sided timing requirementsimple one-sided timing requirementother constraints as well… all easily satisfied in practiceother constraints as well… all easily satisfied in practice
PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)
NANDNAND
Problem (cont.):Problem (cont.): “early”“early” Eval=1Eval=1 non-persistent non-persistent
18
Dual-Rail Design #2: Dual-Rail Design #2: LP2/2LP2/2Optimization = Optimization = “early done”“early done”
Idea: move completion detector Idea: move completion detector beforebefore processing processing blockblockstage indicates whenstage indicates when “about to”“about to” precharge/evaluateprecharge/evaluate
ProcessingBlock
“early” Completion
Detector
Datain
Dataout
“early done”
19
11 22
44
LP2/2 ProtocolLP2/2 ProtocolCompletion Detection:Completion Detection:
performedperformed in parallel in parallel with evaluation/precharge of with evaluation/precharge of stagestage
N evaluatesN evaluates N+1 evaluatesN+1 evaluates
NN N+1N+1 N+2N+2
22
““early done”early done”of N+1 evalof N+1 eval
33
33
““early done”early done”of N+2 evalof N+2 eval
““early done”early done”of N+1 prechof N+1 prech
20
LP2/2 PerformanceLP2/2 Performance
11 22
3344
LP2/2 savings over PS0: LP2/2 savings over PS0: 1 Evaluation + 1 Precharge1 Evaluation + 1 Precharge
DETECTEVAL TT 22Cycle Time =Cycle Time =
21
Dual-Rail Design #3: Dual-Rail Design #3: LP2/1LP2/1Hybrid of LP3/1 and LP2/2…Hybrid of LP3/1 and LP2/2…Combines:Combines:
early evaluationearly evaluation of LP3/1of LP3/1 early doneearly done of LP2/2of LP2/2
Cycle time:Cycle time: Best of our dual-rail lookahead Best of our dual-rail lookahead pipelines… pipelines…