Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 1 times |
NTHU-CS VLSI/CAD LAB
TH EDATH EDA
Student : Da-Cheng JuanStudent : Da-Cheng JuanAdvisor : Shih-Chieh ChangAdvisor : Shih-Chieh Chang
Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization
Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization
2
OutlineOutline
Sleep Transistor Sizing ProblemSleep Transistor Sizing Problem
Maximum Instantaneous Current EstimationMaximum Instantaneous Current Estimation
Time-Frame PartitioningTime-Frame Partitioning for Sizingfor Sizing
Experimental ResultsExperimental Results
ConclusionsConclusions
3
Trend of Low Power DesignsTrend of Low Power Designs
Leakage increases exponentiallyLeakage increases exponentially– reachesreaches more than 50% of total power in 65nm technology.more than 50% of total power in 65nm technology.
Low power design is a must-have, not an optional.Low power design is a must-have, not an optional.
Power dissipationPower dissipation– Active power (active mode)Active power (active mode)– Leakage power (sleep mode)Leakage power (sleep mode)
drainsource
gate
Sub-threshold leakage
4
Power GatingPower Gating
Power GatingPower Gating– One of the most effective ways to reduce leakage. reduce leakage.
Low Vth Logic Devices
VDD
GNDuse high Vth Sleep Transistorto reduce the leakage current
SLVGND
GND
ModeMode SLSL Sleep Sleep TransistorTransistor
ActiveActive 00 ONON
SleepSleep 11 OFFOFF
5
C1 C2 C3
Implementation of Power GatingImplementation of Power Gating
Distributed Sleep Transistor Network (DSTN)Distributed Sleep Transistor Network (DSTN)
VDD
VGND
Low Vth Logic Device
SL SL SL
6
Leakage SavingLeakage Saving
In sleepIn sleep mode:mode:– Leakage: Leakage: proportionalproportional to the ST’s size. to the ST’s size.– Small ST to reduce leakage.Small ST to reduce leakage.
Ileakage
VDD
VGND
Ileakage Ileakage
7
Voltage Drop across Sleep TransistorVoltage Drop across Sleep Transistor
In active mode:In active mode:– Voltage drop across a ST degrades the performance.Voltage drop across a ST degrades the performance.– Voltage drop: Voltage drop: inversely proportionalinversely proportional to the ST’s size. to the ST’s size.– Large ST to bind the voltage drop.Large ST to bind the voltage drop.
VST
VDD
VGND
VST VST
8
Sleep Transistor (ST) SizingSleep Transistor (ST) Sizing
Dilemma scenario:Dilemma scenario:– SmallSmall ST to reduce leakage. (sleep ST to reduce leakage. (sleep mode)mode)– LargeLarge ST to bind the voltage drop. (active mode) ST to bind the voltage drop. (active mode)
ObjectiveObjective: : minimize ST size (leakage) under a specified minimize ST size (leakage) under a specified voltage-drop constraint, voltage-drop constraint, VVSTST**..
VST* VST*VST
VDD
VGND
VST VSTVST*
9
C1 C2 C3
Voltage Drop Estimation with MICVoltage Drop Estimation with MIC
Maximum Instantaneous Current (MIC)Maximum Instantaneous Current (MIC) through a ST through a ST– determines the worst case IR drop.determines the worst case IR drop.
Estimating the upper bound of Estimating the upper bound of MICMIC((STST))– to size ST properly to meet the voltage-drop constraint.to size ST properly to meet the voltage-drop constraint.
MIC(ST1)
VDD
VGNDMIC(ST2) MIC(ST3)
MIC(ST): MIC across a ST.
10
C1 C2 C3
Voltage Drop Estimation with MICVoltage Drop Estimation with MIC
MICMIC((CC) (MIC of a cluster) is easy to measure.) (MIC of a cluster) is easy to measure. Due to current balancing effectDue to current balancing effect
– MICMIC((STST) (MIC through a) (MIC through a ST) is hard to predict.ST) is hard to predict.
MIC(ST1)
VDD
VGNDMIC(ST2) MIC(ST3)
MIC(C1)
Finding the MIC of a cluster is
fast.
Finding the MIC across a ST is time-
consuming.
11
Temporal Perspective of Cluster’s MICTemporal Perspective of Cluster’s MIC
ConventionalConventional wayway– ST sizes are determinedST sizes are determined with with MICMIC
of the of the entire clock periodentire clock period..
(Time Unit : 10ps)
Cluster 1Cluster 2
MIC(C2) occurs at T9.
one clock cycle
MIC(Ci) waveform
(Curr
ent)
MIC(C1) occurs at T6.
12(Time Unit: 10ps)
Curr
ent
(mA
)
Cluster 1Cluster 2
Temporal Perspective of Cluster’s MICTemporal Perspective of Cluster’s MIC
one clock cycle
MIC(Ci) waveform
Smaller time frames lead to:Smaller time frames lead to:– a more accurate MIC estimation.a more accurate MIC estimation.– but high computation complexity.but high computation complexity.
13
DifficultiesDifficulties
Current balancing effectCurrent balancing effect complicates the sizing problem. complicates the sizing problem.
Time-frame partitioningTime-frame partitioning leads to high computation complexity. leads to high computation complexity.
MIC MIC MIC
MIC
one clock cycle
14
ContributionsContributions
More accurate MIC prediction from More accurate MIC prediction from temporal perspective.temporal perspective.
Variable-length Variable-length partitioning to reduce computation complexity.partitioning to reduce computation complexity.
Algorithm to minimize the sizes of sleep transistors.Algorithm to minimize the sizes of sleep transistors.
Achieving 21% area reduction in total sleep transistor sizes Achieving 21% area reduction in total sleep transistor sizes compared with [2]compared with [2].
- [2] Chiou et al. DAC’06
15
OutlineOutline
Sleep Transistor Sizing ProblemSleep Transistor Sizing Problem
Maximum Instantaneous Current EstimationMaximum Instantaneous Current Estimation
Time-Frame PartitioningTime-Frame Partitioning for Sizingfor Sizing
Experimental ResultsExperimental Results
ConclusionsConclusions
16
Resistance NetworkResistance Network
I(ST1) I(ST2) I(ST3)
I(C1) I(C2) I(C3)
R(ST1) R(ST2) R(ST3)
RV RV
C1 C2 C3
VGND
17
The discharging ratio can be calculated byThe discharging ratio can be calculated by– Kirchhoff’s Current LawKirchhoff’s Current Law– Ohm’s LawOhm’s Law
Discharging RatioDischarging Ratio
9Ω 8Ω 10Ω
2Ω 2Ω
C1 C2 C3
0.43* I(C1) 0.34* I(C1) 0.23* I(C1)
I(C1)
VGND
18
Discharging Matrix Ψ (SAI )Discharging Matrix Ψ (SAI )
)(
)(
)(
)(
)(
)(
3
2
1
3
2
1
CI
CI
CI
Ψ
STI
STI
STI
→
333231
232221
131211
ψψψ
ψψψ
ψψψ
Ψwhere
I(ST1) I(ST2) I(ST3)
I(C1) I(C2) I(C3)
C1 C2 C3
VGND
19
MIC(ST) Estimation MechanismMIC(ST) Estimation Mechanism
)(
)(
)(
)(
)(
)(
3
2
1
3
2
1
CMIC
CMIC
CMIC
Ψ
STMIC
STMIC
STMIC
→
MIC(ST1) MIC(ST2) MIC(ST3)
MIC(C1) MIC(C2) MIC(C3)
C1 C2 C3
333231
232221
131211
ψψψ
ψψψ
ψψψ
Ψwhere
20
OutlineOutline
Sleep Transistor Sizing ProblemSleep Transistor Sizing Problem
Maximum Instantaneous Current EstimationMaximum Instantaneous Current Estimation
Time-Frame PartitioningTime-Frame Partitioning for Sizingfor Sizing
Experimental ResultsExperimental Results
ConclusionsConclusions
21
Temporal Perspective of Cluster’s MICTemporal Perspective of Cluster’s MIC
Different MIC(Ci) occurs at different time points.
(Time Unit: 10ps)
Cluster 1Cluster 2
MIC(C2) occurs at T9.
one clock cycle
MIC(Ci) waveform
(Curr
ent)
MIC(C1) occurs at T6.
22
Temporal Perspective of Cluster’s MICTemporal Perspective of Cluster’s MIC
)(
)(
)(
)(
)(
)(
3
2
1
3
2
1
CMIC
CMIC
CMIC
Ψ
STMIC
STMIC
STMIC
Different MIC(Ci) occurs at different time points within a clock period.
Traditional way to estimate MIC(STi) is over pessimistic.
over-estimated !
23
Time-Frame Partitioning for MIC(ST) EstimationTime-Frame Partitioning for MIC(ST) Estimation
Expand MIC(Ci) into MIC(Ci,Tj).
(Time Frame)
Cluster 1Cluster 2
one clock cycle
MIC(Ci,Tj) waveform
(Curr
ent)
MIC(C1,T1)
MIC(C2,T1)
MIC(C1,T3)
MIC(C2,T3)
MIC(C1,T6)
MIC(C2,T6)
24
For each time frame Tj, use MIC(Ci,Tj) to obtain MIC(STi,Tj).
( , ) ( , )
( , ) ( , )
( , ) ( , )
1 1 1 1
2 1 2 1
3 1 3 1
MIC ST T MIC C T
MIC ST T Ψ MIC C T
MIC ST T MIC C T
Time-Frame Partitioning for MIC(ST) EstimationTime-Frame Partitioning for MIC(ST) Estimation
25
Time-Frame Partitioning for MIC(ST) EstimationTime-Frame Partitioning for MIC(ST) Estimation
For ST1, the maximum MIC(ST1,Tj) among all Tj is the upper bound of MIC(ST1) after partitioning.
Cluster 1Cluster 2
(Time Frame)
one clock cycle
MIC(STi,Tj) waveform
MIC(ST1)
ST 1ST 2
(Curr
ent)
MIC(ST2)
26
Notation ReviewNotation Review
MICMIC((CCii))– Maximum Instantaneous Current of Maximum Instantaneous Current of iith th ClusterCluster
MICMIC((STSTii))– Estimated MIC upper bound Estimated MIC upper bound flowing through flowing through iith th sleep transistorsleep transistor
MICMIC((CCii,T,Tjj))– MIC of MIC of CCi i in in jjthth time frame time frame
MICMIC((STSTii,T,Tjj) =) =Ψ * MICΨ * MIC((CCii,T,Tjj))– Estimated MIC upper bound Estimated MIC upper bound through through STSTi i in in jjthth time frame time frame
MICMIC((STSTii) = ) = Ψ * MICΨ * MIC((CCii))– WithWith time-frame partitioning time-frame partitioning
MICMIC((STSTii) = max{ ) = max{ MICMIC((STSTii,T,Tjj) for all ) for all j j }}– WithoutWithout time-frame partitioning time-frame partitioning
27
Time-Frame Partitioning for MIC(ST) EstimationTime-Frame Partitioning for MIC(ST) Estimation
Cluster 1Cluster 2
(Time Frame)
one clock cycle
MIC(STi,Tj) waveform
MIC(ST1)
ST 1ST 2
MIC(ST2)
(Curr
ent)
ORIGINAL_MIC(ST1
) 37% larger!
ORIGINAL_MIC(ST2
)27% larger!
Time-Frame Partitioning leads to a better MIC(ST) estimation!
28
Reduce the Computation ComplexityReduce the Computation Complexity
More time frames lead toMore time frames lead to– more accurate voltage-drop estimation.more accurate voltage-drop estimation.– but higher computation complexity.but higher computation complexity.
Reduce the computation complexity:Reduce the computation complexity:– dominated time-frame removaldominated time-frame removal– variable length time-frame partitioningvariable length time-frame partitioning
29
Dominated Time Frame RemovalDominated Time Frame Removal
TT33 is dominated by is dominated by TT66..– MICMIC((CC11,T,T66)) > MIC > MIC((CC11,T,T33),),– MICMIC((CC22,T,T66)) > MIC > MIC((CC22,T,T33).).
NeglectNeglect T T33
– MICMIC((STST11,T,T66)) > MIC > MIC((STST11,T,T33),),– MICMIC((STST22,T,T66)) > MIC > MIC((STST22,T,T33).). Cluster 1
Cluster 2MIC(C1,T6)
MIC(C1,T3)
MIC(C2,T6)
MIC(C2,T3)Cluster MIC
waveform
30
((TTbb dominates dominates TTcc ) and () and (TTbb dominates dominates TTdd))=> the estimated upper bound of Fig(2) will be smaller.=> the estimated upper bound of Fig(2) will be smaller.
Variable-Length Time-Frame PartitioningVariable-Length Time-Frame Partitioning
Ta
uniform two-way partition Variable-length two-way partition
Tb TdTc
MIC(C1,Tb)
MIC(C2,Tb)
MIC(C1,Td)
MIC(C2,Td)
MIC(C1,Tc)
MIC(C2,Tc)
(1) (2)
31
Variable-Length Time-Frame PartitioningVariable-Length Time-Frame Partitioning
WithWith all all MICMIC((CCii)s)s are separatedare separated
- MIC- MIC((STSTii) can be better estimated!) can be better estimated!
Example with the number of time frames = 3Example with the number of time frames = 3one clock cycle
T1 T2 T3
Cluster 1Cluster 2Cluster 3
Cluster MIC waveform
32
Partition one clock period Partition one clock period – with the with the minimum time unit exhausivelyminimum time unit exhausively
Not efficientNot efficient Accurate Accurate MICMIC((STSTii) estimation) estimation
– with with limited number of variable-length time frameslimited number of variable-length time frames EfficientEfficient Only lose slight accuracyOnly lose slight accuracy
Variable-Length Time-Frame PartitioningVariable-Length Time-Frame Partitioning
33
Problem Formulation of ST SizingProblem Formulation of ST Sizing
Inputs:Inputs:1.1. Voltage-drop constraint.Voltage-drop constraint.
2.2. MICMIC((CCii,,TTjj): Cluster’s): Cluster’s MIC information.MIC information.
Objective: Objective: 1.1. Minimize the total width of sleep transistors.Minimize the total width of sleep transistors.2.2. Voltage drops must meet the constraint.Voltage drops must meet the constraint.
Output:Output:1.1. A set of sleep transistor width.A set of sleep transistor width.
34
ST Sizing AlgorithmST Sizing Algorithm
99Ω99Ω 99Ω99Ω
1. Initialize ST size with a large value.
MIC(STi,Tj)= . MIC(Ci,Tj)V(STi,Tj)=MIC(STi,Tj) . R(STi
)
3. Update MIC(STi,Tj) and voltage drops.
Ψ0.38 0.30 0.21 0.18
0.27 0.30 0.21 0.18
0.21 0.24 0.35 0.28
0.14 0.16 0.23 0.36
=Ψ
2. Update the discharging matrix.
Return ST size
Yes
Voltage drops ok?
No
4. Resize ST with the worst drop.
99 73 9999
kV
TSTMICW
ST
jiST )
*
),((*
35
OutlineOutline
Sleep Transistor Sizing ProblemSleep Transistor Sizing Problem
Maximum Instantaneous Current EstimationMaximum Instantaneous Current Estimation
Time-Frame PartitioningTime-Frame Partitioning for Sizingfor Sizing
Experimental ResultsExperimental Results
ConclusionsConclusions
36
Environment SetupEnvironment Setup
TSMC 130nm CMOS technology.TSMC 130nm CMOS technology.
Vdd = 1.3 volt.Vdd = 1.3 volt.
Specified tolerable voltageSpecified tolerable voltage drop: 5% of the ideal supply drop: 5% of the ideal supply voltage (0.065 volt.)voltage (0.065 volt.)
MICMIC((CCii) is obtained via 10,000-random-pattern ) is obtained via 10,000-random-pattern PrimePowerPrimePowerTMTM simulations. simulations.
Minimum time unit is set to 10 pico-second.Minimum time unit is set to 10 pico-second.
37
Implementation FlowImplementation Flow
RTL netlist
SDF file
Gate Positioning
Gate location
VCD Partitioning
Partitioned VCD file
: Our tools
: Commercial tools
Synthesis
Gate-level netlist
MIC Estimation
Variable-length Partitioning (Optional)
ST sizeST Sizing
Simulation
VCD file
Placement
DEF file
38
Experimental ResultsExperimental Results
Avg.
AES
des
t481
i8
frg2
dalu
C7552
C5315
C3540
C1355
C880
C499
C432
Circuit
1 8.09 1.06 1 1.26 1.70
35242837928137272293396544378
1180832181457850976611804
1514162895402502473899405
1080772081417836993113247
1367012255223228353632
48338162283211029043468
28961721625621242692950041016
21901383019534187852377329794
9421685620282186502302029808
422251411496105911305619352
3452561967692331129615050
568364472296684834710741
495426270866775849112817
V-TPTPV-TPTP[2][8]
Runtime (Sec.)Total Area (Width in μm)
Previous works: [2] Chiou et al. DAC’06, [8] Long et al. DAC’03
39
OutlineOutline
Sleep Transistor Sizing ProblemSleep Transistor Sizing Problem
Maximum Instantaneous Current EstimationMaximum Instantaneous Current Estimation
Time-Frame PartitioningTime-Frame Partitioning for Sizingfor Sizing
Experimental ResultsExperimental Results
ConclusionsConclusions
40
ConclusionsConclusions
Propose an efficient sleep transistor sizing method Propose an efficient sleep transistor sizing method for DSTN power-gating designs.for DSTN power-gating designs.
Present theorems based on Present theorems based on temporal perspectivetemporal perspective to to estimate a tight upper bound of voltage drop.estimate a tight upper bound of voltage drop.
Achieve 21% size as well as leakage reduction on Achieve 21% size as well as leakage reduction on average compared with [2]average compared with [2].
- [2] Chiou et al. DAC’06
41
Thanks for your time.Thanks for your time.
42
Q & AQ & A
43
Backup SlidesBackup Slides
44
Sleep Transistor (ST) SizingSleep Transistor (ST) Sizing
In the active modeIn the active mode– Sleep Transistors operate in Sleep Transistors operate in linear region.linear region.– WWST ST isis inversely proportionalinversely proportional to to R RSTST..
WWST ST = = kk / / RRSTST
Relations between Relations between WWSTST and and VVSTST..
kV
STIW
STST )
)((
VDD
VGND
GND
I(ST)
I(ST): the current through the sleep
transistor
VST
VST: the voltage drop across the sleep transistor
45
Sleep Transistor (ST) SizingSleep Transistor (ST) Sizing
Determine the Determine the minimum required sizeminimum required size ( (WWSTST** ) ) based on:based on:1.1. MICMIC((STST))
2.2. VVSTST**:: IR-drop constraintIR-drop constraint
kV
STMICW
STST )
*
)((*
VDD
VGND
GND
MIC(ST)
MIC(ST): Maximum Instantaneous Current (MIC) through STk
V
STIW
STST )
)((
Smaller MIC(ST) leads to a better ST size!
46
MIC WaveformMIC Waveform
Current
Time
Time
Time
Current
Current
Pattern 1
Pattern 2
Pattern 3
MIC waveform of 3 patterns
47
RST Initialization RST Initialization
Physical limitationPhysical limitation– CMOS process limits the width of a sleep transistor.CMOS process limits the width of a sleep transistor.– Choose the minimum width as the initial Choose the minimum width as the initial RRSTST..