UCSD VLSI CAD Laboratory - ICCAD, Nov. 3, 2009
Timing Yield-Aware Color Reassignment and Detailed Placement Perturbation
for Double Patterning Lithography
Mohit Gupta, Kwangok Jeong and Andrew B. KahngUCSD VLSI CAD [email protected]
ECE DepartmentUniversity of California, San Diego
(2/28)
Outline• Bimodal CD Distribution in DPL
• Impact on design timing • Mitigating Impact of Bimodal CD Distribution
• Bimodal-Aware Timing Library• Optimization 1: Color Reassignment (Max Alternation)• Optimization 2: Placement Perturbation (DPL-Correctness)
• Experimental Framework and Results• Impact of Color Reassignment• Impact of Placement Perturbation
• Conclusion
(3/28)
Bimodal CD distribution in DPL
C12-type cell C21-type cell
Gates from CD group1Gates from CD group2
• Two patterning steps Two different CDs
• Two different colorings Two different timings
Linesfrom 1st patterning
Linesfrom 2nd patterning
C12: ODD polys in BLUE, EVEN polys in GREENC21: ODD polys in GREEN, EVEN polys in BLUE
Jeong et al. ASPDAC’09
(4/28)
Impact of Bimodality on Guardband• Comparison of design guardband (Min-Max delay)
• FACT 1: Unimodal representation is too pessimistic!
CD mean difference
Large CD group
Small CD group
Jeong et al. ASPDAC’09
(5/28)
Impact of Bimodality on Path Delay• By definition, 2(x+y) = 2(x) + 2(y) + 2 cov(x,y)• Delay variation of a timing path,
• Since cov(d(gi),d(qj)) cov(d(gi),d(gj)) or cov(d(qi),d(qj)), variation of bimodal distribution is smaller than unimodal distribution
jiji
jjjj
iiii
ii
ii
ji
ii
qdgdqdqdgdgd
qdgd
qdgdpathd
,,,
)(),(cov)(),(cov)(),(cov
)()(
)()())((
22221
21
21
21
22
22
• Simulation results validated• FACT 2: Alternate (mixed)
coloring has smaller delayvariation!
Jeong et al. ASPDAC’09
Sigm
a / M
ean
(%)
(6/28)
• Different coloring sequences in a clock network Clock skew
• FACT 3: Same color on all clock buffers is better!
Impact of Bimodality on Clock Skew
Case Launch Capture1 C12+C12+C12+…+C12 C12+C12+C12+…+C122 C12+C12+C12+…+C12 C21+C21+C21+…+C21
Case2
Case1
Clo
ck s
kew
(s)
Jeong et al. ASPDAC’09
Launch
capture
(7/28)
Bimodal CD Distribution: 3 Key Facts
1. Design requires bimodal-aware timing models
• Unimodal representation is too pessimistic
2. Data paths benefit from alternate (mixed) coloring
• Exploit existence of two uncorrelated CD populations
• Minimize correlated variations in a given path
3. Clock paths benefit from uniform coloring
• Correlated variation between launch and capture paths
minimizes bimodality-induced clock skew
(8/28)
DPL Layout-to-Mask Flow
RTL-to-GDS
DPL Mask Coloring
Bimodal-AwareTiming Analysis
ILP to Maximize Alternate Coloring
(Datapaths)
Placement Perturbationfor Color Conflict Removal
(Clock and Datapaths)
Optimization 1
Optimization 2
(9/28)
Outline• Bimodal CD Distribution in DPL
• Impact on design timing • Mitigating Impact of Bimodal CD Variation
• Bimodal-Aware Timing Library• Optimization 1: Color Reassignment (Max Alternation)• Optimization 2: Placement Perturbation (DPL-Correctness)
• Experimental Framework and Results• Impact of Color Reassignment• Impact of Placement Perturbation
• Conclusion
(10/28)
Bimodality-Aware Timing Model and Analysis• Timing model
• Two timing libraries: • G1L-G2S: group1 has larger CD than group2• G1S-G2L: group1 has smaller CD than group2
• Two coloring versions of a cell in each library• C12: leftmost poly is in group1• C21: leftmost poly is in group2
• CD Mean difference• Chosen from process information• E.g., 2nm, 4nm and 6nm
• Timing analysis• For each CD mean difference, check timing slack using each of
timing libraries G1L-G2S and G1S-G2L• Worse timing between G1L-G2S and G1S-G2L libraries is
regarded as the actual worst-case timing
G1G2
G2G1
(11/28)
Optimization 1: Maximum Alternate Coloring• Maximize alternate (mixed) coloring
Minimize delay variation• How to quantify alternation of coloring sequence?
New metric: Coloring Sequence Cost (CSC) • Represents delay variation due to the coloring
(12/28)
Delay and Coloring• Rise delay depends on PMOS tr. ~10% variation• Fall delay depends on both NMOS trs. ~ 1% variation
MP1
MN1
MP2
MN2ZN
VSS
VDDA1 A2
A1 A2ZN
MP1
MN1
MN2
VDD
VSS
MP2
A1 CD (nm)
A2 CD(nm)
Fall: A1(ps)
Rise: A1
(ps)
Fall: A2
(ps)
Rise: A2
(ps)51 51 49.79 97.30 54.65 113.149 49 48.23 88.25 51.48 102.150 50 48.30 92.90 53.08 107.651 49 49.05 97.26 52.32 102.249 51 48.89 88.28 53.79 113.0
MP1
MN1
MP2
MN2ZN
VSS
VDDA1 A2
G1L-G2S
G1S-G2L
(13/28)
Coloring Sequence Cost (CSC) for NAND2• Two observations
• Activated transistors determine the delay• The impact on delay is averaged when more than one
transistor are activated
• Assign CSC for single transistor• Group1: −1 (CSCMP1 = CSCMN1 = −1)• Group2: +1 (CSCMP2 = CSCMN2 = +1)
• CSC for NAND2 gate• A1ZN rise (by MP1): -1• A2ZN rise (by MP2): 1• A2ZN fall (by MN1 and MN2):
(1 + -1) / 2 = 0• A1ZN fall (by MN1 and MN2)
(-1 + 1) / 2 = 0
MP1
MN1
MP2
MN2ZN
VSS
VDDA1 A2
A1 A2ZN
MP1
MN1
MN2
VDD
VSS
MP2
(14/28)
CSC Calculation for Cells - Examples
AND2 gate
A
MP1
MN1
MP2
MN2
BUFFER gate
Z
VSS
VDD
ZMP1
MN1
MP2
MN2A
VDD
VSS
A1 A2
MP1
MN1
MP2
MN2
MP3
MN3
Z
VSS
VDD
A1 A2MP1MP2
MN1
MN2
MP3
A2MN3A1
Z
VDD
VSS
A1Z fall: {-1} + (-1) = -2A1Z rise: {(-1 + 1) / 2} + (-1) = -1A2Z fall: {1} + (-1) = 0A2Z rise: {(-1 + 1) / 2} + (-1) = -1
AZ fall : -1 + 1 = 0AZ rise : -1 + 1 = 0
Topology CSC calculation
Parallel CSC of activated tr.
Series Average of all series tr.
Fingered Average of all fingered tr.
Multiplestages
Sum of CSC of each stage
(15/28)
Coloring Sequence Cost for Path (CSCP)
• CSCP = Sum of CSC values of stages in path, weighted by stage delay (Di)• CSCPi =
• Correlation between CSCP and delay variation• 1,300 different colorings of a timing path• CSCP metric is strongly correlated
with delay variation of timing paths• Correlation coefficient: 0.902
• CSCP reduction Delay variation reduction
il
il DCSC
l : timing arc in a path i
(16/28)
Maximization of Alternate Coloring• Optimal timing path coloring problem:
• Given a set of timing-critical paths: P • Color each cell in union of timing paths to minimize
• ILP to minimize maximum CSCP• Objective:
• Subject to:
iPi CSCPMaxM
MMinimize
},{, ,
)( )(
, ,
,,
101
11
2112
jjjj
jlCil
jlCi
i
i
yxyx
yjCSCxjCSCCSCPkiCSCPMkiCSCPM
(17/28)
Impact of Alternate Coloring Optimization• Alternate coloring improves timing slack and reduces
timing variation: JPEG 70% utilization case
• TNS improves by 11% ~ 27%
TNS(ns): Initial coloringTNS(ns): Alternate coloring
TNS
(ns)
(18/28)
Optimization 2: Placement Perturbation• DPL feasibility: distance between same-color polys must be
larger than minimum resolution
• Coloring assignment from Optimization 1 can introduce additional coloring conflicts into an existing layout
• Placement perturbation for DPL-Correctness
2dpb > Resmin
dpb: distance from poly to cell boundary
Resmin: minimum resolution
(a) Original placement
Logical connection
(b) Alternate coloring
Coloring conflict
(c) Conflict removal
> Resmin
(19/28)
DP Using Cost of Coloring Conflicts• HCost: Horizontal placement cost under constraints
• Cost of placing a cell “a” to a placement site “b”• Considers the spacing between poly lines in different cells
spacing = xa + b + LPSa − (xa−1 + i + wa−1 − RPS
a−1) (b: displacement of cell a to site b)
• HCost is defined as:If ((spacing < Rmin) && (LPC
a == RPCa−1))
• HCost(a, b, a − 1, i) = Otherwise• HCost(a, b, a − 1, i) = 0
Rightmost-Poly of cell a-1
Leftmost-Polyof cell a
LaPSRa-1
PS
wa-1 waxa-1 xa
Ra-1PC
=0La
PC
=0La-1
PC
=1Ra
PC
=1
(20/28)
Two Dynamic Programming Approaches• DP Algorithm 1: SHIFT
• Minimize total displacement cost, considering HCost
• DP Algorithm 2: SHIFT+RECOLOR • Necessary when high utilization blocks Algorithm 1• Performs simultaneous recoloring of non-timing critical cells• Cost is defined for each color of cell instances, e.g., C12
and C21• Other DP variants: MAX, FLIP
bs , b) Cost(
, i), aHCost(a, b, i)Cost(aMinbsCost(a, b) SRCHxSRCHxiaa
a
a
111
111
1
aSlacka e *Timing criticality weight for displacement
(21/28)
Outline• Bimodal CD Distribution in DPL
• Impact on design timing • Mitigating Impact of Bimodal CD Variation
• Bimodal-Aware Timing Library• Optimization 1: Color Reassignment (Max Alternation)• Optimization 2: Placement Perturbation (DPL-Correctness)
• Experimental Framework and Results• Impact of Color Reassignment• Impact of Placement Perturbation
• Conclusion
(22/28)
Experiment FrameworkPlaced and routed design
(SOC Encounter) orig.def
Initial Coloring initial_colored.def
Timing Analysis(PrimeTime - SI)
ILP Instance
Optimal Coloring(Alternate Coloring
maximization)
slack.listkeep_color.listopt_colored.def
Conflicts Removal(SHIFT,
SHIFT+RECOLOR)opt.def
Optimization 1
Optimization 2
(23/28)
Optimization 1: Max Alternate Coloring• Testcases with 45nm Nangate Open Cell Library
Init. Opt.2nm
Init. Opt.4nm
Init. Opt.6nm
Init. Opt.2nm
Init. Opt.4nm
Init. Opt.6nm
59% reduction 85%
reduction
(24/28)
Optimization 2: Placement Perturbation
#CC (#coloring conflicts), SDT (sum of displacement of timing-critical cells), SDNT (sum of displacement of nontiming-critical cells), #RC (# recolored cells)
• All SHIFT runtimes for JPEG are 204-354 seconds• All SHIFT+RECOLOR runtimes are 578-678 seconds
(25/28)
Overall Timing Improvement
• Bimodal timing model Reduce pessimism• Alternate coloring Improve timing• Placement perturbation Remove conflicts
Stage #Conflict TimingMetric
Mean CD Difference
2nm 4nm 6nm
Initial Coloring(Unimodal) 0
WNS (ns) -1.113 -2.016 -2.902
TNS (ns) -671.1 -1776.3 -3348.5Initial Coloring(Bimodal) 0
WNS (ns) -0.191 -0.354 -0.527
TNS (ns) -8.17 -26.56 -64.64
AlternateColoring 219
WNS (ns) -0.090 -0.145 -0.267
TNS (ns) -1.48 -3.85 -22.40
DPL-Corr(+ECO Routing) 0
WNS (ns) -0.104 -0.183 -0.295
TNS (ns) -3.43 -10.45 -28.42
The impact of bimodality can be effectively mitigated!
(26/28)
Conclusion• Contributions
• New CSC metric to represent the timing variation in double patterning• ILP-based color reassignment to improve timing slack and variation• DP-based placement perturbation to remove coloring conflicts after
color reassignment• Results (45nm Nangate Open Library)
• Up to 232ps WNS reduction and 36.22ns TNS reduction• WNS variation reduction from 380ps to 84ps • TNS variation reduction from 64ns to 22ns
• Ongoing work • More accurate metrics for timing path color balancing to enhance
timing quality• Golden DPL timing and placement optimizer based on simultaneous
timing-aware coloring and conflict removal
UCSD VLSI CAD Laboratory - ICCAD, Nov. 3, 2009
THANK YOU
UCSD VLSI CAD Laboratory - ICCAD, Nov. 3, 2009
BACKUP
(29/28)
Property(2): Clock Skew and Timing Slack• Timing slack calculation
• Timing slack: • Timing slack variation:
• Clock skew• Especially, clock skew from uncorrelated launching and
capturing clock paths are the major source of timing slack variation.
• Example
pathdatacyclepathclockslack TTTT __
pathdatapathclockTTT TTpathdatapathclockslack __
222 ,cov2__
Large correlation is better for timing slack
Data (10 2 = 8~12ns)Clock (10 2 = 8~12ns)
Worst slack = 5 5 = 0ns
Worst slack = min(clock) – max(data) = 8 12 = 4ns Worst slack = 15 15 = 0ns
(a) Worst slack in DPLSmall delay variation
but large negative slack
(b) Worst slack in single exp.Large delay variation
but zero slack
Data (10 – 5 = 5ns)Clock (10 – 5 = 5ns)
Data (10 + 5 = 15ns)Clock (10 + 5 = 15ns)
BC
WC
(30/28)
Simulation Setup: Skew and Slack• Testcase
• AES from Opencores, Nangate 45nm library, PTM 45nm• Extracted critical path
Clock launch: 14 stages
Clock capture: 14 stages
Data path: 30 stages
• Exhaustive tests (4 x 254) not feasible, so we fix the data path coloring.
Case Launch Capture1 M12+M12… M12+M12…2 M21+M21… M21+M21…3 M12+M12… M21+M21…4 M21+M21… M12+M12…5 M12+M21… M12+M21…
M1 M2
Mean 3s Mean 3s
CD Mean
Uni-modal 50.00 2.00 - -
0nmPooled 50.00 2.00 - -
Bimodal 50.00 2.00 50.00 2.00
1nmPooled 50.00 2.50 - -
Bimodal 49.50 2.00 50.50 2.00
2nmPooled 50.00 3.61 - -
Bimodal 49.00 2.00 51.00 2.00
3nmPooled 50.00 4.92 - -
Bimodal 48.50 2.00 51.50 2.00
4nmPooled 50.00 6.32 - -
Bimodal 48.00 2.00 52.00 2.00
5nmPooled 50.00 7.76 - -
Bimodal 47.50 2.00 52.50 2.00
6nmPooled 50.00 9.22 - -
Bimodal 47.00 2.00 53.00 2.00
(31/28)
Experiments on Clock Skew and Timing Slack• Clock skew• Even for the zero mean
difference case, clock skew exists and increases with mean difference
• Pooled unimodal can not distinguish this clock skew
• Timing slack• Originally zero slack turns
out to be significant negative slack
• Pooled unimodal shows very pessimistic slack
Timing slack (s) for MAX-MAX combination
-3.00E-10
-2.50E-10
-2.00E-10
-1.50E-10
-1.00E-10
-5.00E-11
0.00E+00
5.00E-11
1.00E-10
0nm 1nm 2nm 3nm 4nm 5nm 6nmMean difference (nm)
Sla
ck (s
)
Unimodal (Pooled)Bimodal (case1)Bimodal (case2)Bimodal (case3)Bimodal (case4)Bimodal (case5)
0.00E+00
1.00E-11
2.00E-11
3.00E-11
4.00E-11
5.00E-11
6.00E-11
0nm 1nm 2nm 3nm 4nm 5nm 6nmMean difference (nm)
Clo
ck s
kew
(s)
Launch (G12+G12...), Capture (G12+G12...)Launch (G21+G21...), Capture (G21+G21...)Launch (G12+G12...), Capture (G21+G21...)Launch (G21+G21...), Capture (G12+G12...)Launch (G12+G21...), Capture (G12+G21...)
22ps
53ps