GPSCP: A General-Purpose Support-Circuit Preconditioning ...zhuofeng/MTU_VLSI_DA_files/... ·...

Design Automation Group

GPSCP: A General-Purpose Support-CircuitPreconditioning Approach to Large Scale SPICE AccuratePreconditioning Approach to Large-Scale SPICE-Accurate

Nonlinear Circuit Simulations

Xueqian ZhaoAuthors:

Xueqian ZhaoZhuo Feng

Department of Electrical & Computer EngineeringMichigan Technological University

1

Michigan Technological University

Large-Scale SPICE-Accurate Nonlinear Circuit Simulation

Motivations– Modern ICs that integrate billions of transistors and interconnect components

need to be accurately modeled and analyzedneed to be accurately modeled and analyzed

– Fast SPICE simulators may introduce errors due to various approximations Challenges in large-scale SPICE-accurate circuit simulations

Direct methods may not be runtime and memory efficient– Direct methods may not be runtime and memory efficient

– Iterative solvers (GMRES) require reliable and efficient preconditioners

– The same accuracy as SPICE simulator

VinMp

C

Vout

Iout

Cur. Amp C

IfVG LDOLDO

Analog Circuit Blocks

+- Vref

Rf1

Rf2

Cout

Error Amp

Amp. Cf

ICLDO LDO

2Digital Circuit Blocks

Original Circuit with Analog and Digital Blocks

Circuit Simulation Background

Problem formulation– Nonlinear differential equations

( ) ( ( )) ( ( )) ( ) 0dF x f x t q x t u tdt

– f(.) and q(.) denote the static and dynamic nonlinearities, respectively

Standard SPICE simulators rely on Newton-Raphson (NR) method– Linearize the nonlinear devices (transistors, etc)

dt

( , )

Obt i th fi l l ti th h NR it ti

( ) , ( )k kk k

x x

f qG x C xx x

– Obtain the final solution through NR iterations

xk1 xk Fx

1

F (xk )

3Jacobian matrix

Prior Works and Our Previous Approaches

Existing direct and iterative solvers– Direct solver: LU decomposition (KLU [1])

E i f l l d bl d t– Expensive for large-scale and non-sparse problems due to the exponentially increased memory and runtime cost

– Krylov-subspace iterative methods: GMRES [2]– Achieve better memory efficiency– Convergence rate depends on the effectiveness and

efficiency of preconditionersy p Our previous approaches: support-graph (circuit) preconditioned

iterative methods– Support-graph preconditioner for large-scale power grid networkSupport graph preconditioner for large scale power grid network

simulations

– Support-circuit preconditioner for large-scale interconnect-dominant nonlinear circuit

4

[1] T. Davis and E. Palamadai Natarajan. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Softw., 2010.[2] Y. Saad and M. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput.,

1986.

nonlinear circuit

Support-Graph Preconditioner [1] Support-graph preconditioner (SG) for linear networks

– Find maximum weighted (or low stretch) spanning tree in the original graph

Matrix factors for the spanning tree can be computed in linear time and space– Matrix factors for the spanning tree can be computed in linear time and space– Highly efficient and effective preconditioner for large circuit simulations

1 2 342

1 2 0 1 0 0 0 0 0d

G1

42 2 3 1' 2 0 0 0 0 0 0 0d

P

4 5

1

6

4

6 58

1 3

32

3

4

5

6

2 4 0 3 0 0 0 00 4 0 0 8 0 0 01 0 0 6 0 4 0 00 3 0 6 5 0 1 00 0 8 0 5 0 0 30 0 0 4 0 0 9 0

dd

dd

dd

1

4

6 58

1 3

3

654

2

3

4

5

6

2 ' 4 0 0 0 0 0 00 4 ' 0 0 8 0 0 00 0 0 ' 6 0 4 0 00 0 0 6 ' 5 0 0 00 0 8 0 5 ' 0 0 00 0 0 4 0 0 ' 9 0

dd

dd

dd

Matrix 1st 2nd 3rd 4th 5th 6th cond

98749

7

8

9

0 0 0 4 0 0 9 00 0 0 0 1 0 9 40 0 0 0 0 3 0 4

dd

d

The condition number of P-1G can be greatly reduced

497 8 9

7

8

9

0 0 0 4 0 0 9 00 0 0 0 0 0 9 ' 40 0 0 0 0 0 0 4 '

dd

d

Matrix 1st 2nd 3rd 4th 5th 6th condG 26.170 23.182 17.572 11.514 9.373 6.673 135.948P 25.239 23.540 17.579 10.909 9.865 6.822 16.752

5

P-1G 1.431 1.204 1.062 1.000 1.000 1.000 17.442[1] X. Zhao, J. Wang, Z. Feng and S. Hu. Power grid analysis with hierarchical support graphs. In Proc. ICCAD, 2011

Support-Circuit Preconditioner [1] Support-circuit preconditioners (SCP) for interconnect-dominant

circuits– Sparsify the linear networks of the original circuit networkp y g

– Take advantage of existing sparse matrix solution techniques (e.g. KLU)

– Limitations: only efficient for interconnect-dominant circuits with near-linear complexitycomplexity

Support Graph of the Original Network

LDO LDO

LDOLDO LDOLDO

LDO LDO LDO LDO

6

Digital Circuit Blocks Support-Circuit Preconditioner[1] X. Zhao and Z. Feng, Towards Efficient SPICE-Accurate Nonlinear Circuit Simulation with On-the-Fly Support-Circuit Preconditioners. In Proc. DAC, 2012

Our Proposed GPSCP Method

Our proposed method: general-purpose support-circuit preconditioned (GPSCP) iterative solver:

– Effective for solving general large-scale nonlinear circuits

– Scalable linearized circuit sparsification

– Based on support graph and graph sparsification research [1-2]– Based on support graph and graph sparsification research [1-2]

– Energy-based preconditioner improving

– Dynamic preconditioner updating

7

[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.

[2] M. Bern, J. R. Gilbert, B. Hendrickson, N. Nguyen, and S. Toledo. Support-graph preconditioners. SIAM J. Matrix Anal. Appl., 2006.

General-Purpose Support-Circuit Preconditioner

General-purpose support-circuit preconditioners– Allow for general large-scale nonlinear circuit simulations

– Parasitics-dominant analog circuits such as amplifiers, PLLs, …– Good scalability of complete circuit sparsification

– Solve large transistor-dominant circuits with near-linear complexitySo e a ge t a s sto do a t c cu ts t ea ea co p e ty– Tree-like support-circuit preconditioner– Near-linear computational and memory cost

g Cg V

g d

C

gdC3g 2gd

g3RR

2R

dg Cg V

g dgdC3g 2g

dsgdsCm gsg V

sgsC

1g4g 5g

s4R 5R

1R dsgdsCm gsg V

s1g

5g

8

Linearized CircuitNonlinear Circuit Support Circuit

Support Circuit Construction (1)

Support graph can be obtained through the following steps:– 1. Decompose the original graph into a Laplacian graph and a directed graph

2 Extract the support graph based for the Laplacian graph

g dgdC3g 2gd

g3R2R

– 2. Extract the support graph based for the Laplacian graph

dsgdsCm gsg V

sgsC

1g4g 5g

g

s

3

4R 5R

1R

Cgd

gdCh

2ggCg

dgdCh

2gg Cg

dgdCh

2gg

Linearized CircuitNonlinear Circuit

dsg dsCh

g

sgsCh g

3g

4gdsg dsC

hm gsg Vg

sgsCh g

3g

4gdsg dsC

h

g

sg

3g

9

Original Weighted Graph1g

5gWeighted Graph Laplacian Support Graph Laplacian

1g5g 1g

5g

Support Circuit Construction (2)

Support-circuit preconditioner is subsequently built by– 1. Combining support graph and active components

2 Factorizing the support circuit matrix using sparse matrix solvers

Cg dgdCh

2g

dg Cg V

g dgdC3g 2g

– 2. Factorizing the support circuit matrix using sparse matrix solvers

dsg dsCh

s1g

3g

g

dsgdsCm gsg V

s1g

5g

dCg

dgdCh

2g

3g

g

5gSupport Graph dsg dsC

hm gsg V

s1g5g

Active

mgV

1gSupport Circuit Sub Support

CircuitSub Support

Circuit

10

Active Components

Support-Circuit Preconditioner

Towards A Better Support Graph

Convergence of support-graph preconditioners– The convergence is determined by the condition number of matrix

pencil (G,P)p ( , )

Th t f il (G P) (P 1G) i d fi d

max

min

( , )( , )( , )G Pk G PG P

– The support of pencil (G,P) (P-1G) is defined as:

– Eigenvalues of pencil (G,P) (P-1G) are bounded by ( , ) min{ | ( ) 0, all }T nG P x P G x x

Spanning-tree support graph as a preconditioner– May not efficient for ill-conditioned system

Reduced overall conductivities of the resistive network

T T

– Reduced overall conductivities of the resistive network

– Miss-matched power dissipation between the original graph and the spanning-tree graph

11

Power dissipated by G: Power dissipated by P: Tx Gx Tx Px

Towards A Better Support Graph (cont.)

Graph approximation quality– A weighted graph P σ-approximates a weighted graph A if

– means

( ) ( ) ( ), ( ) is the Laplacian matrix of P A P A A

( ) ( )P A 2( )T Ti ix Px x Ax x ( ) ( )

edge

( )i ii

Better support graph approximations

– Resistive network A, : power dissipationTx Ax– The spanning tree P of A retains: n-1 edges, therefore

T Tx Px x Ax

– If and ,

the preconditioner can be more effective.

( ) ( )eigen P eigen A

P( ) ( )power P power A

12

the preconditioner can be more effective.P

Ultra-Sparsifier Support Graph (1)

Graph sparsification (non-tree)– Ultra-sparsifier [1] contains at most n-1+k edges (spanning tree + extra edges)

Spanning tree Ultra-sparsifierSpanning tree Ultra sparsifier

– It is k-ultra-sparse that -approximates the original graph with high b bilit [1]

(1)/ o

nk n

Edges of spanning tree graph Extra edges

probability [1]

– Spanning tree is 0-ultra-sparse

– Ultra-sparsifier better approximates the original graph

/k n

13

[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.


Maximum weighted degree metric– Provides trade-offs between the preconditioner quality and the runtime

efficiency of matrix factorizations.y

– The weighted degree of vertex v in a graph A is defined:4

w w1( )

max

ii

i

wwd v

w

1v

w1

w2w3

In a mesh grid 1 (1 critical edge) ≤ d( ) ≤ 4 (4 e enl critical edges)

1( ) ( )v V

awd A wd vn

w4

– In a mesh grid, 1 (1 critical edge) ≤ wd(v) ≤ 4 (4 evenly critical edges)

– Consider nodes with larger weighted degree values– Each edge is as critical as others

14


Iterative critical node selection– Define γ as the percentage of weighted degree range

Define θ as the percentage of graph approximation (power dissipation)– Define θ as the percentage of graph approximation (power dissipation)

Step 1• Initialize γ and θ (e.g. set to be relatively large

values such as 0.7)Ultra-sparsifierSpanning tree

Step 2• Select all the nodes of Agraph that satisfy• wd(v) > γ x awd(Agraph)

U i i l ti t t P d

U t a spa s ep g

Step 3• Using prior solution to compute Pwrselect and

compare it with PwrAgraph

• If Pwrselect > θ x PwrAgraph, the selected node:

Step 4g p

critical nodes;• Otherwise, reduce γ and repeat steps 2-4

• Selected critical nodes: pick its top few most critical Extra edges

Critical nodes

15

Step 5p p

edges

Energy-based Spanning-graph Scaling

It has been shown that the graph approximation also means the difference of power consumption between different graphs.

( ) T T TP A A

2( ) , T Th ij i jx A x x P x g x x

( ) T T Tgraph graph graphx P x x A x x A x

1

The process of support circuit improvement based on energy-based spanning-graph scaling.

( ) ,graph st ij i jx A x x P x g x x 1

spanning graph scaling.Spanning Tree Scaled Spanning Tree Ultra-Sparsifier

16

Original Edge Scaled Edge Extra Critical Edge w/o Scaling

Dynamic Preconditioner Updating Scheme

During the Newton-Raphson steps or transient steps, the linearized models of transistors may change drastically

Re-computing the support circuit for each Newton-Raphson step may introduce substantial overhead

If the support graph changes significantly:

| | cur prei id i

g gIf the support graph changes significantly:

| |

pre

node i

inode i

tolg

node i

• tol: user defined value• gi ll i t i id t t d i i t t

Do: regenerate the support graph, as well as support circuit

17

• gicur: all passive components incident to node i in current step• gipre: all passive components incident to node i in previous step

Complete Algorithm Flow

Netlist Input Extract passive networks

Linearized circuit

Evaluate devices Find maximum spanning treeFor each

network

Create matrix G and P Compute wd(v)

and awd

PGMRES iterative solver

Then add un-picked critical edges to v

Iteratively find critical nodes

No

Yes

Converge?

Combine ultra-sparsifier with

18

Return solution active components

Experimental Setup

CKT #nunk #Mos #nnz #PnnzMemory (MB)

Direct GPSCP

test1 202,738 67,451 1,156,428 823,156 204.91 37.21(5.5X)

test2 202,738 114,664 2,081,570 1,689,365 450.80 73.90(6.1X)

3 608 12 192 603 3 12 301 2 626 443 916 8 1 6 1 ( 2X)test3 608,127 192,603 3,127,301 2,626,443 916.78 176.15(5.2X)

test4 608,127 327,426 5,629,682 4,857,630 1,651.22 250.11(6.6X)

test5 1 187 452 644 852 10 837 454 9 460 210 3 136 83 468 19(6 7X)test5 1,187,452 644,852 10,837,454 9,460,210 3,136.83 468.19(6.7X)

test6 63,981 - 575,981 494,757 204.53 30.92(6.6X)

• Tests 1-5: large PDNs with on-chip voltage regulators

• #nunk: number of unknowns in the circuits• #Mos: number of MOSFET in the circuits• #nnz: number of non-zero elements in the MNA matrix

• Test 6: Industrial analog design (only MNA matrix available)

19

• #nnz: number of non-zero elements in the MNA matrix• #Pnnz: number of non-zero elements in the preconditioner matrix• Memory: memory cost during LU factorization of MNA matrix

Experimental Results (1)

Support-circuit MNA matrix can well preserve the dominant eigenvalues

1 7008x 10

5 Top 20 Largest Eigenvalues of Systems Matrices

1.7007

1.7008OriginalSpanning-TreeUltra-Sparsifier

1.7006

1.7007

nitu

de

Ultra-Sparsifier

1.7006Mag

n

0 2 4 6 8 10 12 14 16 18 201.7005

1.7005

20

0 2 4 6 8 10 12 14 16 18 20PDN with on-chip VRs

(DC analysis)


Support-circuit MNA matrix can well preserve the dominant eigenvalues In TR analysis

Top 18 Largest Eigenvalues of Systems6 Top 20 Largest Eigenvalues of Systems

2200

Top 18 Largest Eigenvalues of Systems

OriginalUltra-sparsifier

4

5x 10

6 Top 20 Largest Eigenvalues of Systems

x 105

OriginalUltra-sparsifier

1800

2000

agni

tude

2

3

agni

tude

1 7006

1.7006

1.7007

1.7007x 10

1600M

a

0

1

Ma

5 10 15 201.7005

1.7006

0 5 10 151400

Industrial analog circuit d i (TR l i )

0 5 10 15 20

0

PDN with on-chip VRs (TR l i )

21

design (TR analysis)(TR analysis)


Runtime & Memory Efficiency between SCP [1] and GPSCP

Runtime speedups over the direct solver Memory improvements of matrixRuntime speedups over the direct solverusing SCP [1] and GPSCP algorithms arereported.

Memory improvements of matrixfactorization over the direct solverusing SCP [1] and GPSCP algorithmsare reported.Nonlinearity:

##NonDevT tD

22

[1] X. Zhao and Z. Feng. Towards efficient SPICE-accurate nonlinear circuit simulation with on-the-fly support-circuit preconditioners. In Proc. ACM DAC, 2012.

#TotDev


CKT Direct GPSCPFact Solve Setup Fact GMRES #iter. Speedup Error(%)

• Runtime comparison for a single Newton-Raphson step (DC)

test1 3.31 0.05 0.42 0.22 0.22 10 3.9X 0.05test2 4.94 0.06 0.52 0.28 0.29 12 4.6X 0.05test3 23.03 0.18 1.42 0.64 2.02 13 5.7X 0.05test4 36 85 0 20 1 48 0 91 2 31 14 7 8X 0 05test4 36.85 0.20 1.48 0.91 2.31 14 7.8X 0.05test5 63.35 0.61 2.15 1.74 2.37 17 10.8X 0.05test6 18.12 0.02 1.02 0.30 0.74 19 8.8X 0.1

• Runtime comparison for transient analysis 35000 1400

20000

25000

30000

35000Direct

Non-dnm

Dynamic

e (s

)

6 1X/8 6X

9.9X/14.0X

800

1000

1200

1400

Sizes (K-nodes)

10000

15000

20000

Run

time

4 0X/5 1X

4.7X/5.8X

6.1X/8.6X

200

400

600

800

23

0

5000

test1 test2 test3 test4 test5

3.6X/4.4X 4.0X/5.1X

0

200

test1 test2 test3 test4 test5

Conclusion

Proposed a general-purpose support-circuit preconditioner (GPSCP) for scalable large-scale nonlinear circuit simulation ( ) g

Key Ideas:– 1 Extract ultra-sparsifier support graphs from the passive1. Extract ultra sparsifier support graphs from the passive

networks of linearized circuit

– 2. Combine them with the active components (e.g. controlled sources)sources)

– 3. Use energy-based preconditioner improving and dynamic preconditioner updating schemes

Our experimental results show that GPSCP can:– Obtain up to 14X speedups in DC and transient simulations

24

– Reduce up to 80% memory consumption

THANK YOU!

25

Date post:	23-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

GPSCP: A General-Purpose Support-Circuit Preconditioning ...zhuofeng/MTU_VLSI_DA_files/... ·...

Documents