MCMM Clock Tree Optimization based on Slack Redistribution Using
a Reduced Slack GraphRickard Ewetz and Cheng-Kok Koh
School of Electrical and Computer Engineering Purdue University
ASP-DAC 2016
Clock Tree Synthesis
Placement
Clock Tree Synthesis
Signal Routing
Clock Tree Optimization
Clock Tree Construction
Sink locationsTiming constraints
TopologyArrival timesTiming slack
No Timing violations
Timing Slack
Combinational Logic
it
FFi FFj
jt
ijji
H
jij
CQ
i
hold
ji
ijij
S
jij
CQ
i
setup
ij
tttttslack
tttttTslack
min
max
),( jiCCA jjiCCAj
ijiCCAi
t
t
),,(
),,(
On-chip variations
i j
setup
ijji slackw
hold
ijij slackw
Slack Graph (SG)ijiCCAt ),,( jjiCCAt ),,(
Problem Formulation
i j10
-44 jiji w
ijjij
S
jij
CQ
i
setup
ij tttttTslack max
jiw
ijji
setup
ij wslack
setup
ijslack0
jj
new
ii
new
i
tt
tt
j
i j
Problem: Find delay adjustments that remove the negative slacks!
k
h
10 ijij w
6
04
0
j
i
Multiple Corners Multiple Modes1.00 V
1.00 V
1.00 V0.70 V
0.70 V
0.7 V-5 % Eff Ion
-5 % Eff Ion
-5 % Eff Ion
1.00 V, 100 % Eff Ion
0.70 V, 100 % Eff Ion
0.70 V, 95 % Eff Ion
0.70 V, 95 % Eff Ion
i j10
-4
i j14
-5i ji j
1
k ),,( 432
kkk
1
k
444
333
222
111
jiji
jiji
jiji
w
w
w
w jiji
Previous WorkClock tree with violations across MCMM
Assignment of delay adjustments Realization of delay adjustments
MCMM Optimization in [14]
[14] W. Shen, Y. Cai, W. Chen, Y. Lu, Q. Zhou, and J. Hu. Useful clock skew optimization under a multi-corner multi-mode design framework. ISQED’10, pages 62–68, 2010.
Worst Negative setup slack
Similar Negative setup slack
Timing improvementacross MCMM
Clock Tree Resynthesis in [16]
[16] S. Roy, P. Mattheakis, L. Masse-Navette, and D. Pan. Clock tree resynthesis for multi-corner multi-mode timing closure. IEEE Tran. on CAD of IC and Sys., pages 589–602, 2015.[1] T.-B. Chan, K. Han, A. B. Kahng, J.-G. Lee, and S. Nath. OCV-aware top-levelclock tree optimization. GLSVLSI ’14, pages 33–38, 2014.
SCSM Slack Redistribution in [10]
i j10
-4
[10] J. Lu and B. Taskin. Post-CTS clock skew scheduling with limited delay buffering.In Circuits and Systems, pages 224–227, 2009.
Ehkwkhkh
Vk
k
),(,
min
i j ......
Mode Adjustable Delay Buffers
Mode
[8] J. Kim and T. Kim. Useful clock skew scheduling using adjustable delay buffers inmulti-power mode designs. ASP-DAC’15, pages 466–471, Jan 2015.
)40,20( 21 kk
MCMM Optimization
[12] V. Ramachandran. Construction of minimal functional skew clock trees. ISPD’12, pages 119–120, 2012.[15] M. Shrivastava and C. Park. Compressing scenarios of electronic circuits, Apr. 15 2014. US Patent 8,701,063.
Corner compression in [15]-4
-5
Required arrival time
CLK )( '**mod*
slackslackreqreq SSSS
LP formulation in [12]
ijijjjii pathslackscc
Our WorkClock tree with violations across MCMM
Assignment of delay adjustments Realization of delay adjustments
4
Linearization of Delay Adjustments111
jiji w
1122 c
Linearization of delay adjustments
111
jiji w
222
jijiw
222
jijiw
12
2
11
c
wij
ij
2112112
jiji wcc
12
2
11
c
wji
ji
},min{12
2
111
c
ww
ji
jiji
s
s
Ssc
wji
ji 1
11 min
Scenario reduction
Proposed Reduced Slack Graph (rSG)
i j
10
-4
i j
14
-5
i j
min{10,14/2} = 7
min{-4, -5/2} = -41122 c
212 c
Linearization of delay
adjustments
[10] J. Lu and B. Taskin. Post-CTS clock skew scheduling with limited delay buffering.In Circuits and Systems, pages 224–227, 2009.
One Reduced Slack Graph (rSG)|S| Slack Graphs
Modified CTO Flow
Create rSG
Formulate LP problemSolve LP problem
Find delay adjustments from solution of LP problem
Realize delay adjustments using Feasible Delay
Adjustment Ranges (FDARs)
Improvements of TNS and WNS?
End
Yes
No
Proposed in this work
Proposed in this work
Flow inearlier studies
LP solution
kji ,,, ...
Feasible Delay Adjustment Ranges (FDARs)
1 2 3 4-4 -2 2
10 12 10
a b c d
e f
4b0a 0c 0d
6 f0e
6 10 10
0 0 2
],[ in
bf
out
bf fffwwFDAR
]16,6[]106,06[ fFDAR
Experimental Evaluation
• Circuits [3] based on Open Cores verilog spec.[11]
• Optimization guided by (estimates) TNS and WNS
– Linear estimates of OCV on a path
• Evaluation using Monte Carlo framework
– OCV similar to in the ISPD 2010 contest.
– Variations generated using quad tree
– Performance measured in Yield
[11] OpenCores. http://opencores.net/. 2014.
[3] R. Ewetz, S. Janarthanan, and C.-K. Koh. Benchmark circuits for clock schedulingand synthesis. https://purr.purdue.edu/publications/1759, 2015.
CTO guided by TNS and WNSCircuits Structure TNS/WNS
(ps/ps)Cap(pF)
Run-time(min)
mcmm_fpu pre-CTOpost-CTOCT in [4]
RCT in [4]
220/220/0
177/2779/14
3.233.373.784.12
1525
86
mcmm_ecg pre-CTOpost-CTOCT in [4]
RCT in [4]
184/180/0
1914/301335/21
16.7616.8917.2717.92
66925337
mcmm_aes pre-CTOpost-CTOCT in [4]
RCT in [4]
2147/29462/9
8702/773208/43
46.6848.7361.5983.06
224326131198
[4] R. Ewetz, S. Janarthanan, and C.-K. Koh. Construction of reconfigurable clock treesfor MCMM designs. DAC’15, 2015.
CTO Evaluation in YieldCircuits Structure Yield
(%)Cap(pF)
Run-time(min)
mcmm_fpu pre-CTOpost-CTOCT in [4]
RCT in [4]
87.3100.067.698.8
3.233.373.784.12
1525
86
mcmm_ecg pre-CTOpost-CTOCT in [4]
RCT in [4]
98.4100.094.099.6
16.7616.8917.2717.92
66925337
mcmm_aes pre-CTOpost-CTOCT in [4]
RCT in [4]
71.299.316.799.1
46.6848.7361.5983.06
224326131198
[4] R. Ewetz, S. Janarthanan, and C.-K. Koh. Construction of reconfigurable clock treesfor MCMM designs. DAC’15, 2015.
Summary and Questions
• Defined a reduced slack graph to capture MCMM constraints
• Use FDARs to realize delay adjustments
• High yield at low capacitive overhead
• Future work
– Investigate making topological changes