Design Automation Group
GPSCP: A General-Purpose Support-CircuitPreconditioning Approach to Large Scale SPICE AccuratePreconditioning Approach to Large-Scale SPICE-Accurate
Nonlinear Circuit Simulations
Xueqian ZhaoAuthors:
Xueqian ZhaoZhuo Feng
Department of Electrical & Computer EngineeringMichigan Technological University
1
Michigan Technological University
Large-Scale SPICE-Accurate Nonlinear Circuit Simulation
Motivations– Modern ICs that integrate billions of transistors and interconnect components
need to be accurately modeled and analyzedneed to be accurately modeled and analyzed
– Fast SPICE simulators may introduce errors due to various approximations Challenges in large-scale SPICE-accurate circuit simulations
Direct methods may not be runtime and memory efficient– Direct methods may not be runtime and memory efficient
– Iterative solvers (GMRES) require reliable and efficient preconditioners
– The same accuracy as SPICE simulator
VinMp
C
Vout
Iout
Cur. Amp C
IfVG LDOLDO
Analog Circuit Blocks
+- Vref
Rf1
Rf2
Cout
Error Amp
Amp. Cf
ICLDO LDO
2Digital Circuit Blocks
Original Circuit with Analog and Digital Blocks
Circuit Simulation Background
Problem formulation– Nonlinear differential equations
( ) ( ( )) ( ( )) ( ) 0dF x f x t q x t u tdt
– f(.) and q(.) denote the static and dynamic nonlinearities, respectively
Standard SPICE simulators rely on Newton-Raphson (NR) method– Linearize the nonlinear devices (transistors, etc)
dt
( , )
Obt i th fi l l ti th h NR it ti
( ) , ( )k kk k
x x
f qG x C xx x
– Obtain the final solution through NR iterations
xk1 xk Fx
1
F (xk )
3Jacobian matrix
Prior Works and Our Previous Approaches
Existing direct and iterative solvers– Direct solver: LU decomposition (KLU [1])
E i f l l d bl d t– Expensive for large-scale and non-sparse problems due to the exponentially increased memory and runtime cost
– Krylov-subspace iterative methods: GMRES [2]– Achieve better memory efficiency– Convergence rate depends on the effectiveness and
efficiency of preconditionersy p Our previous approaches: support-graph (circuit) preconditioned
iterative methods– Support-graph preconditioner for large-scale power grid networkSupport graph preconditioner for large scale power grid network
simulations
– Support-circuit preconditioner for large-scale interconnect-dominant nonlinear circuit
4
[1] T. Davis and E. Palamadai Natarajan. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Softw., 2010.[2] Y. Saad and M. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput.,
1986.
nonlinear circuit
Support-Graph Preconditioner [1] Support-graph preconditioner (SG) for linear networks
– Find maximum weighted (or low stretch) spanning tree in the original graph
Matrix factors for the spanning tree can be computed in linear time and space– Matrix factors for the spanning tree can be computed in linear time and space– Highly efficient and effective preconditioner for large circuit simulations
1 2 342
1 2 0 1 0 0 0 0 0d
G1
42 2 3 1' 2 0 0 0 0 0 0 0d
P
4 5
1
6
4
6 58
1 3
32
3
4
5
6
2 4 0 3 0 0 0 00 4 0 0 8 0 0 01 0 0 6 0 4 0 00 3 0 6 5 0 1 00 0 8 0 5 0 0 30 0 0 4 0 0 9 0
dd
dd
dd
1
4
6 58
1 3
3
654
2
3
4
5
6
2 ' 4 0 0 0 0 0 00 4 ' 0 0 8 0 0 00 0 0 ' 6 0 4 0 00 0 0 6 ' 5 0 0 00 0 8 0 5 ' 0 0 00 0 0 4 0 0 ' 9 0
dd
dd
dd
Matrix 1st 2nd 3rd 4th 5th 6th cond
98749
7
8
9
0 0 0 4 0 0 9 00 0 0 0 1 0 9 40 0 0 0 0 3 0 4
dd
d
The condition number of P-1G can be greatly reduced
497 8 9
7
8
9
0 0 0 4 0 0 9 00 0 0 0 0 0 9 ' 40 0 0 0 0 0 0 4 '
dd
d
Matrix 1st 2nd 3rd 4th 5th 6th condG 26.170 23.182 17.572 11.514 9.373 6.673 135.948P 25.239 23.540 17.579 10.909 9.865 6.822 16.752
5
P-1G 1.431 1.204 1.062 1.000 1.000 1.000 17.442[1] X. Zhao, J. Wang, Z. Feng and S. Hu. Power grid analysis with hierarchical support graphs. In Proc. ICCAD, 2011
Support-Circuit Preconditioner [1] Support-circuit preconditioners (SCP) for interconnect-dominant
circuits– Sparsify the linear networks of the original circuit networkp y g
– Take advantage of existing sparse matrix solution techniques (e.g. KLU)
– Limitations: only efficient for interconnect-dominant circuits with near-linear complexitycomplexity
Support Graph of the Original Network
LDO LDO
LDOLDO LDOLDO
LDO LDO LDO LDO
6
Digital Circuit Blocks Support-Circuit Preconditioner[1] X. Zhao and Z. Feng, Towards Efficient SPICE-Accurate Nonlinear Circuit Simulation with On-the-Fly Support-Circuit Preconditioners. In Proc. DAC, 2012
Our Proposed GPSCP Method
Our proposed method: general-purpose support-circuit preconditioned (GPSCP) iterative solver:
– Effective for solving general large-scale nonlinear circuits
– Scalable linearized circuit sparsification
– Based on support graph and graph sparsification research [1-2]– Based on support graph and graph sparsification research [1-2]
– Energy-based preconditioner improving
– Dynamic preconditioner updating
7
[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.
[2] M. Bern, J. R. Gilbert, B. Hendrickson, N. Nguyen, and S. Toledo. Support-graph preconditioners. SIAM J. Matrix Anal. Appl., 2006.
General-Purpose Support-Circuit Preconditioner
General-purpose support-circuit preconditioners– Allow for general large-scale nonlinear circuit simulations
– Parasitics-dominant analog circuits such as amplifiers, PLLs, …– Good scalability of complete circuit sparsification
– Solve large transistor-dominant circuits with near-linear complexitySo e a ge t a s sto do a t c cu ts t ea ea co p e ty– Tree-like support-circuit preconditioner– Near-linear computational and memory cost
g Cg V
g d
C
gdC3g 2gd
g3RR
2R
dg Cg V
g dgdC3g 2g
dsgdsCm gsg V
sgsC
1g4g 5g
s4R 5R
1R dsgdsCm gsg V
s1g
5g
8
Linearized CircuitNonlinear Circuit Support Circuit
Support Circuit Construction (1)
Support graph can be obtained through the following steps:– 1. Decompose the original graph into a Laplacian graph and a directed graph
2 Extract the support graph based for the Laplacian graph
g dgdC3g 2gd
g3R2R
– 2. Extract the support graph based for the Laplacian graph
dsgdsCm gsg V
sgsC
1g4g 5g
g
s
3
4R 5R
1R
Cgd
gdCh
2ggCg
dgdCh
2gg Cg
dgdCh
2gg
Linearized CircuitNonlinear Circuit
dsg dsCh
g
sgsCh g
3g
4gdsg dsC
hm gsg Vg
sgsCh g
3g
4gdsg dsC
h
g
sg
3g
9
Original Weighted Graph1g
5gWeighted Graph Laplacian Support Graph Laplacian
1g5g 1g
5g
Support Circuit Construction (2)
Support-circuit preconditioner is subsequently built by– 1. Combining support graph and active components
2 Factorizing the support circuit matrix using sparse matrix solvers
Cg dgdCh
2g
dg Cg V
g dgdC3g 2g
– 2. Factorizing the support circuit matrix using sparse matrix solvers
dsg dsCh
s1g
3g
g
dsgdsCm gsg V
s1g
5g
dCg
dgdCh
2g
3g
g
5gSupport Graph dsg dsC
hm gsg V
s1g5g
Active
mgV
1gSupport Circuit Sub Support
CircuitSub Support
Circuit
10
Active Components
Support-Circuit Preconditioner
Towards A Better Support Graph
Convergence of support-graph preconditioners– The convergence is determined by the condition number of matrix
pencil (G,P)p ( , )
Th t f il (G P) (P 1G) i d fi d
max
min
( , )( , )( , )G Pk G PG P
– The support of pencil (G,P) (P-1G) is defined as:
– Eigenvalues of pencil (G,P) (P-1G) are bounded by ( , ) min{ | ( ) 0, all }T nG P x P G x x
Spanning-tree support graph as a preconditioner– May not efficient for ill-conditioned system
Reduced overall conductivities of the resistive network
T T
– Reduced overall conductivities of the resistive network
– Miss-matched power dissipation between the original graph and the spanning-tree graph
11
Power dissipated by G: Power dissipated by P: Tx Gx Tx Px
Towards A Better Support Graph (cont.)
Graph approximation quality– A weighted graph P σ-approximates a weighted graph A if
– means
( ) ( ) ( ), ( ) is the Laplacian matrix of P A P A A
( ) ( )P A 2( )T Ti ix Px x Ax x ( ) ( )
edge
( )i ii
Better support graph approximations
– Resistive network A, : power dissipationTx Ax– The spanning tree P of A retains: n-1 edges, therefore
T Tx Px x Ax
– If and ,
the preconditioner can be more effective.
( ) ( )eigen P eigen A
P( ) ( )power P power A
12
the preconditioner can be more effective.P
Ultra-Sparsifier Support Graph (1)
Graph sparsification (non-tree)– Ultra-sparsifier [1] contains at most n-1+k edges (spanning tree + extra edges)
Spanning tree Ultra-sparsifierSpanning tree Ultra sparsifier
– It is k-ultra-sparse that -approximates the original graph with high b bilit [1]
(1)/ o
nk n
Edges of spanning tree graph Extra edges
probability [1]
– Spanning tree is 0-ultra-sparse
– Ultra-sparsifier better approximates the original graph
/k n
13
[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.
Ultra-Sparsifier Support Graph (2)
Maximum weighted degree metric– Provides trade-offs between the preconditioner quality and the runtime
efficiency of matrix factorizations.y
– The weighted degree of vertex v in a graph A is defined:4
w w1( )
max
ii
i
wwd v
w
1v
w1
w2w3
In a mesh grid 1 (1 critical edge) ≤ d( ) ≤ 4 (4 e enl critical edges)
1( ) ( )v V
awd A wd vn
w4
– In a mesh grid, 1 (1 critical edge) ≤ wd(v) ≤ 4 (4 evenly critical edges)
– Consider nodes with larger weighted degree values– Each edge is as critical as others
14
Ultra-Sparsifier Support Graph (3)
Iterative critical node selection– Define γ as the percentage of weighted degree range
Define θ as the percentage of graph approximation (power dissipation)– Define θ as the percentage of graph approximation (power dissipation)
Step 1• Initialize γ and θ (e.g. set to be relatively large
values such as 0.7)Ultra-sparsifierSpanning tree
Step 2• Select all the nodes of Agraph that satisfy• wd(v) > γ x awd(Agraph)
U i i l ti t t P d
U t a spa s ep g
Step 3• Using prior solution to compute Pwrselect and
compare it with PwrAgraph
• If Pwrselect > θ x PwrAgraph, the selected node:
Step 4g p
critical nodes;• Otherwise, reduce γ and repeat steps 2-4
• Selected critical nodes: pick its top few most critical Extra edges
Critical nodes
15
Step 5p p
edges
Energy-based Spanning-graph Scaling
It has been shown that the graph approximation also means the difference of power consumption between different graphs.
( ) T T TP A A
2( ) , T Th ij i jx A x x P x g x x
( ) T T Tgraph graph graphx P x x A x x A x
1
The process of support circuit improvement based on energy-based spanning-graph scaling.
( ) ,graph st ij i jx A x x P x g x x 1
spanning graph scaling.Spanning Tree Scaled Spanning Tree Ultra-Sparsifier
16
Original Edge Scaled Edge Extra Critical Edge w/o Scaling
Dynamic Preconditioner Updating Scheme
During the Newton-Raphson steps or transient steps, the linearized models of transistors may change drastically
Re-computing the support circuit for each Newton-Raphson step may introduce substantial overhead
If the support graph changes significantly:
| | cur prei id i
g gIf the support graph changes significantly:
| |
pre
node i
inode i
tolg
node i
• tol: user defined value• gi ll i t i id t t d i i t t
Do: regenerate the support graph, as well as support circuit
17
• gicur: all passive components incident to node i in current step• gipre: all passive components incident to node i in previous step
Complete Algorithm Flow
Netlist Input Extract passive networks
Linearized circuit
Evaluate devices Find maximum spanning treeFor each
network
Create matrix G and P Compute wd(v)
and awd
PGMRES iterative solver
Then add un-picked critical edges to v
Iteratively find critical nodes
No
Yes
Converge?
Combine ultra-sparsifier with
18
Return solution active components
Experimental Setup
CKT #nunk #Mos #nnz #PnnzMemory (MB)
Direct GPSCP
test1 202,738 67,451 1,156,428 823,156 204.91 37.21(5.5X)
test2 202,738 114,664 2,081,570 1,689,365 450.80 73.90(6.1X)
3 608 12 192 603 3 12 301 2 626 443 916 8 1 6 1 ( 2X)test3 608,127 192,603 3,127,301 2,626,443 916.78 176.15(5.2X)
test4 608,127 327,426 5,629,682 4,857,630 1,651.22 250.11(6.6X)
test5 1 187 452 644 852 10 837 454 9 460 210 3 136 83 468 19(6 7X)test5 1,187,452 644,852 10,837,454 9,460,210 3,136.83 468.19(6.7X)
test6 63,981 - 575,981 494,757 204.53 30.92(6.6X)
• Tests 1-5: large PDNs with on-chip voltage regulators
• #nunk: number of unknowns in the circuits• #Mos: number of MOSFET in the circuits• #nnz: number of non-zero elements in the MNA matrix
• Test 6: Industrial analog design (only MNA matrix available)
19
• #nnz: number of non-zero elements in the MNA matrix• #Pnnz: number of non-zero elements in the preconditioner matrix• Memory: memory cost during LU factorization of MNA matrix
Experimental Results (1)
Support-circuit MNA matrix can well preserve the dominant eigenvalues
1 7008x 10
5 Top 20 Largest Eigenvalues of Systems Matrices
1.7007
1.7008OriginalSpanning-TreeUltra-Sparsifier
1.7006
1.7007
nitu
de
Ultra-Sparsifier
1.7006Mag
n
0 2 4 6 8 10 12 14 16 18 201.7005
1.7005
20
0 2 4 6 8 10 12 14 16 18 20PDN with on-chip VRs
(DC analysis)
Experimental Results (2)
Support-circuit MNA matrix can well preserve the dominant eigenvalues In TR analysis
Top 18 Largest Eigenvalues of Systems6 Top 20 Largest Eigenvalues of Systems
2200
Top 18 Largest Eigenvalues of Systems
OriginalUltra-sparsifier
4
5x 10
6 Top 20 Largest Eigenvalues of Systems
x 105
OriginalUltra-sparsifier
1800
2000
agni
tude
2
3
agni
tude
1 7006
1.7006
1.7007
1.7007x 10
1600M
a
0
1
Ma
5 10 15 201.7005
1.7006
0 5 10 151400
Industrial analog circuit d i (TR l i )
0 5 10 15 20
0
PDN with on-chip VRs (TR l i )
21
design (TR analysis)(TR analysis)
Experimental Results (3)
Runtime & Memory Efficiency between SCP [1] and GPSCP
Runtime speedups over the direct solver Memory improvements of matrixRuntime speedups over the direct solverusing SCP [1] and GPSCP algorithms arereported.
Memory improvements of matrixfactorization over the direct solverusing SCP [1] and GPSCP algorithmsare reported.Nonlinearity:
##NonDevT tD
22
[1] X. Zhao and Z. Feng. Towards efficient SPICE-accurate nonlinear circuit simulation with on-the-fly support-circuit preconditioners. In Proc. ACM DAC, 2012.
#TotDev
Experimental Results (4)
CKT Direct GPSCPFact Solve Setup Fact GMRES #iter. Speedup Error(%)
• Runtime comparison for a single Newton-Raphson step (DC)
test1 3.31 0.05 0.42 0.22 0.22 10 3.9X 0.05test2 4.94 0.06 0.52 0.28 0.29 12 4.6X 0.05test3 23.03 0.18 1.42 0.64 2.02 13 5.7X 0.05test4 36 85 0 20 1 48 0 91 2 31 14 7 8X 0 05test4 36.85 0.20 1.48 0.91 2.31 14 7.8X 0.05test5 63.35 0.61 2.15 1.74 2.37 17 10.8X 0.05test6 18.12 0.02 1.02 0.30 0.74 19 8.8X 0.1
• Runtime comparison for transient analysis 35000 1400
20000
25000
30000
35000Direct
Non-dnm
Dynamic
e (s
)
6 1X/8 6X
9.9X/14.0X
800
1000
1200
1400
Sizes (K-nodes)
10000
15000
20000
Run
time
4 0X/5 1X
4.7X/5.8X
6.1X/8.6X
200
400
600
800
23
0
5000
test1 test2 test3 test4 test5
3.6X/4.4X 4.0X/5.1X
0
200
test1 test2 test3 test4 test5
Conclusion
Proposed a general-purpose support-circuit preconditioner (GPSCP) for scalable large-scale nonlinear circuit simulation ( ) g
Key Ideas:– 1 Extract ultra-sparsifier support graphs from the passive1. Extract ultra sparsifier support graphs from the passive
networks of linearized circuit
– 2. Combine them with the active components (e.g. controlled sources)sources)
– 3. Use energy-based preconditioner improving and dynamic preconditioner updating schemes
Our experimental results show that GPSCP can:– Obtain up to 14X speedups in DC and transient simulations
24
– Reduce up to 80% memory consumption
THANK YOU!
25