Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph...

1

Graph Sparsification Approaches to Scalable Integrated Circuit Modeling and Simulations

Zhuo Feng

ICSICT, Oct, 2014

Design Automation Group

Acknowledgements: My PhD students Xueqian Zhao (MTU) and Lengfei Han (MTU)

2

Scalable SPICE-Accurate IC Simulations

+-

VinMp

Vref

Rf1

Rf2

Cout

Vout

Iout

Error Amp

Cur. Amp. Cf

If

IC

VG

VR VR

VRVR

Analog Circuit Blocks

Digital Circuit BlocksOriginal Circuit with

Analog and Digital Blocks

Motivation– Integrated circuit (IC) system that involves billions of transistors and

interconnect components needs to be accurately modeled and analyzed

Challenges in large-scale SPICE-accurate IC simulations– Computational cost grows rapidly with traditional direct solution methods

– Iterative solution methods need to be robust and efficient for general tasks

Power Delivery Network (PDN) w/ Embedded Voltage Regulators (VRs)

3

Background of SPICE Simulation Algorithms

Standard SPICE simulators rely on Newton-Raphson (NR) method– Step1: Linearize the nonlinear devices (transistors, diodes, etc)

– Step 2: Update the solution through NR iteration

( ) , ( )k kk k

x x

f qG x C xx x

δ δδ δ

= =

( ) ( ( )) ( ( )) ( ) 0dF x f x t q x t u tdt

= + + =

Problem formulation– Nonlinear differential equations

– f(.) and q(.) denote the static and dynamic nonlinearities, respectively

Jacobian of F(x)

4

Prior Works

Direct and iterative solvers have been used in SPICE simulations– Direct solver: LU decomposition (KLU [1])

– Expensive for large-scale post-layout IC problems due to the exponentially increased memory and runtime cost

– Krylov-subspace iterative methods: GMRES [2]– Pros: black box solver, good memory efficiency, high parallelism– Cons: problem dependent convergence properties, worse runtime

– ILU and domain-decomposition based preconditioners, etc

References:[1] T. Davis, et al. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Softw., 2010.[2] Y. Saad, et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 1986.[3] D. A. Spielman, et al. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. ACM STOC, 2004.[4] M. Bern, et al. Support-graph preconditioners. SIAM J. Matrix Anal. Appl., 2006.

Our contribution: a circuit-oriented preconditioning approach– Novel circuit-oriented preconditioners (compared to matrix-oriented ones )

– Rigorous mathematic foundation: graph sparsification research [3-4]

– Consistent performance when solving transistor-level nonlinear circuits

5

Graph Sparsification Techniques Graph sparsification basics

– Find a subgraph P approximating the original graph G in some measure (pairwise distance, cut values, graph Laplacian, etc)

– Maintain the same set of vertices such that P can be used as a proxy for G in numerical computations w/o introducing much error

– A good graph sparsifier should keep very few edges to limit the computation and storage cost

Figure source: L. Koutis, G. L. Miller and R. Peng. A fast solver for a class of linear systems. Commun. ACM, 2012

G P

6

Support-graph preconditioner (SGP)– Example: find a spanning tree from the original graph

– Compute matrix factors w/o introducing any fill-ins for the spanning tree

The condition number of P-1G can be greatly reduced

1 2 3

4 5

1

987

6

42

4

6 5

49

8

1 3

3

1

2

3

4

5

6

7

8

9

2 0 1 0 0 0 0 02 4 0 3 0 0 0 00 4 0 0 8 0 0 01 0 0 6 0 4 0 00 3 0 6 5 0 1 00 0 8 0 5 0 0 30 0 0 4 0 0 9 00 0 0 0 1 0 9 40 0 0 0 0 3 0 4

dd

dd

dd

dd

d

Support-Graph Preconditioner

1

1

42

4

6 5

49

8

1 3

3

2 3

654

7 8 9

1

2

3

4

5

6

7

8

9

' 2 0 0 0 0 0 0 02 ' 4 0 0 0 0 0 00 4 ' 0 0 8 0 0 00 0 0 ' 6 0 4 0 00 0 0 6 ' 5 0 0 00 0 8 0 5 ' 0 0 00 0 0 4 0 0 ' 9 00 0 0 0 0 0 9 ' 40 0 0 0 0 0 0 4 '

dd

dd

dd

dd

d

Matrix 1st 2nd 3rd 4th 5th 6th condG 26.170 23.182 17.572 11.514 9.373 6.673 135.948P 25.239 23.540 17.579 10.909 9.865 6.822 16.752

P-1G 1.431 1.204 1.062 1.000 1.000 1.000 17.442

G P

7

A naïve support-circuit preconditioner (SCP)– Sparsifies the linear networks of the original circuit network

– Takes advantage of existing sparse matrix techniques (Cholesky, LU, etc)

– Nearly-linear complexity for analyzing nanoscale (parasitics-dominant) ICs– E.g. clock networks, power delivery networks, etc.

Support-Circuit Preconditioner

VR VR

VRVR

Digital Circuit Blocks

VR VR

VRVR

Support-Circuit Preconditioner

Support Graph of the Original Network

8

General-purpose support-circuit preconditioner (GPSCP)– Extracts sparsified network from the linearized circuit of the original circuit

– Leverages existing sparse matrix solution techniques

– Nearly-linear complexity for analyzing more general nonlinear circuit systems

Support-Circuit Preconditioner (Cont.)

Linearized Circuit

dsgdsCm gsg V

g

s

d

gsC

gdC

1g4g

3g 2g

5g

Nonlinear Circuit

dg

s

3R

4R 5R

1R

2R

dsgdsCm gsg V

g

s

dgdC

1g

3g 2g

5g

Support Circuit

9

Nonlinear Circuit

dg

s

3R

4R 5R

1R

2R

Support-Circuit Preconditioner Extraction (1)

Directed weighted graph corresponding to a linearized circuit – Can be obtained around an solution point during NR iterations

– Will be sparsified through graph decomposition and sparsification

Linearized Circuit

dsgdsCm gsg V

g

s

d

gsC

gdC

1g4g

3g 2g

5g1

Directed Weighted Graph

dsg dsChm gsg V

g

s

d

gsCh

gdCh

1g

2g3g

4g

5g

2 dsg dsCh

g

s

d

gsCh

gdCh

1g

2g3g

4g

5gUndirected Weighted Graph

3

Support Graph

dsg dsCh

g

s

dgdCh

1g

2g3g

5g

4

10

Controlling Sources

mgV

dsg dsCh

g

s

dgdCh

1g

2g

3g

5gSupport Graph

Support-Circuit Preconditioner Extraction (2)

Support-circuit preconditioner extraction– Combine support graph and other components (e.g. controlling sources)

– Factor the Jacobian matrix of the support circuit to create the preconditioner

dsg dsChm gsg V

g

s

dgdCh

1g

2g

3g

5g

Support Circuit

5

5

dsgdsCm gsg V

g

s

dgdC

1g

3g 2g

5g

6

Spt-CKT Spt-CKT

General-Purpose Support Circuit

7

11

Quality Quantification of Support Graph Preconditioners

Convergence of support-graph preconditioners– The convergence relies on the condition number of matrix pencil (G,P)

– The support of pencil (G,P) is defined as:

– Eigenvalues of pencil (G,P) are bounded by– A smaller means faster convergence

τ( , ) min | ( ) 0, all T nG P x P G x xσ τ τ= ∈ℜ − ≥ ∈ℜ

max

min

( , )( , )( , )G Pk G PG P

λλ

=

Spanning-tree support graph as a preconditioner– May require many iterations to converge if (mismatch) is too large

– can be estimated by comparing Joule heating of two resistive networks

Power dissipated by G:

Power dissipated by P:

Tx Gx

Tx Px

τ

ττ

12

Ultra-Sparsifier Support Graph (1)

Ultra-sparsifier (non-tree) support graphs– Ultra-sparsifier contains at most n-1+k edges (spanning tree + extra edges)

– It is k-ultra-sparse that -approximates the original graph with high probability [1]

– Adding extra edges to the spanning tree can better approximate the original graph (e.g. eigenvalues, power dissipations)

Spanning tree

Edges of spanning tree graph Extra edges

Ultra-sparsifier

[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.

13


Sparsity control of an ultra-sparsifier support graph– Provides tradeoffs between the quality and efficiency of preconditioners

– Weighted degree of a vertex v in a graph A is defined:

– Example: for a 2D-mesh grid, 1 ≤ wd(v) ≤ 4– If wd(v) ->1: one dominant edge – If wd(v) ->4 : four evenly critical edges

( )

( )( )max ( , )u neighbor v

vol vwd vw u v∈

=

vol(v): total weight incident to node vw(u,v): the weight of the edge connecting nodes v and u

14


Iterative ultra-sparsifier support graph construction– Define θ as the matching factor threshold (0 < θ < 1) of node weighted degree

Step 1• Compute weighted degree wd of each node

in the original graph A

Step 2• Compute the support graph A’ with

weighed degree wd’

Step 3• Recover edges to A’ until wd’/wd > θ for

each node in the support graph A’

Step 4• Return the final ultra-sparsifier support

graph A’ for support-circuit preconditioningExtra edges

Ultra-sparsifierSpanning tree

wd’/wd < θwd’/wd > θ

15

Performance Model Guided Sparsification

Runtime performance model can help find the optimal θ– Which is better: a denser or sparser support graph?

tot GMRES LUT N T T= ⋅ +

LUTGMRESTN ⋅

Denser preconditioner

1. Greater LU factorization time2. Less GMRES iterations

LUT

GMRESTN ⋅

Sparser preconditioner

1. Less LU factorization time2. More GMRES iterations

Goal: minimize Ttot by finding a proper matching factor threshold θ !

Total Runtime:

16

Finding the Optimal Weighted Degree Threshold θ Optimal weighted degree threshold θ

– Exploit symbolic matrix factorization results to quickly identify optimal θ– E.g. find θ that maximizes the flops change of Cholesky factorizations

17

Performance Modeling Results

Experiments results of IBM power grid benchmarks

Runtime and flops vs. weighted degree threshold θ

Runtime results of manual and automatic sparsification schemes

18

Test Cases for Experiments

CKT # nunk # Mos # R # C # L # I

ldo1 3M 84K 6M 250K 7K 250K

ldo2 5M 71K 10M 422K 12K 422K

pg1 3M 144 6M 250K 7K 250K

pg2 6M 144 11M 490K 14K 490K

clk1 3M 65K 6M 3M - -

clk2 6M 65K 11M 6M - -

Circuit Design Parameters:• #nunk: number of unknowns in the circuits• #Mos: number of MOSFET• #R: number of resistors• #L: number of inductors• #C: number of capacitors• #I: number of current sources

Three Circuit Design Types:• ldo: large PDNs with on-chip VRs• pg: large PDNs with power gating• clk: clock distribution network

19

Results of Performance Model Guided Sparsification

Experimental results for a large PDN with multiple VRs– Performance guided sparsification approach achieve nearly-optimal runtime

Runtime of a single NR step using different θ

20

Experimental Results

CKT #NR Direct GPSCPTime (s) #GMRES Time (s) Speedup

ldo1 237 279,629 4,130 15,368 18X

ldo2 314 - 3,979 23,793 -

pg1 222 108,784 3,381 10,204 11X

pg2 421 185,892 3,478 14,206 13X

clk1 132 50,688 1,452 3,493 14X

clk2 219 112,497 2,555 8,001 14X

• Runtime comparison for transient analysis (100-time-step)

• Memory comparisonCKT Direct GPSCP

ldo1 4.2GB 0.8GB/5X

ldo2 - 1.1GB/-

pg1 3.2GB 0.8GB/4X

pg2 7.8GB 1.6GB/5X

clk1 4.3GB 0.8GB/5X

clk2 10.0GB 1.4GB/7X

21

Experimental Results (2)

A large PDN with embedded multiple VRs

22

RF Simulation Methods For nonlinear RF circuits, output is usually quasi-periodic

– SPICE may require simulating many periods to reach steady state

– Time-domain shooting method can not handle distributed devices Harmonic Balance (HB) analysis for steady-state RF simulation

– HB analysis can capture the steady-state spectral response directly

– Harmonic balance also refers to balancing the current between linear and nonlinear portions at every harmonic frequency

Output may containfreqs. other than 0ω

( )t0cos ω

NonlinearCircuit

+v−

v Freq Domain, MHz

dB

Time Domain (ps)

Volta

ge (v

)

23

HB Analysis of RF Circuits

Non-autonomous circuit analysis[1]

: state variables

: impulse response function of linear circuit components

: dynamic nonlinearities

: static nonlinearities

: time-dependent excitation sources

[1] K. S. Kundert and A. Sangiovanni-vincentelli. Simulation of Nonlinear Circuits in the Frequency Domain, CAD, 1986

( )x t

( )q

( )f

( )b t

( )y t

are typically periodic functions( ),x t ( ),q ( )f

24

HB Analysis of RF Circuits (2) HB Jacobian matrix (frequency domain)

– and represent the Fast Fourier Transform(FFT) and Inverse Fast Fourier Transform(IFFT) respectively

– G and C denote the linearization of q() and f() at s time domain sampled points, (s=2k+1, k is positive frequencies number)

– includes lots of dense blocks introduced by

1102 −− ΓΓ+ΓΩΓ+= GCfjYJhb π

∂∂

∂∂

∂∂

=

St

t

t

xq

xq

xq

C

2

1

∂∂

∂∂

∂∂

=

St

t

t

xf

xf

xf

G

2

1

−

=Ω

kI

kI

0

Γ 1−Γ

hbJ 1 1&C G− −Γ Γ Γ Γ

25

Challenges in Harmonic Balance (HB) Analysis Direct Methods for RF HB circuit simulation (A. Mehrotra et al, DAC’09)

– Challenged by solving large yet non-sparse Jacobian matrices– Cons: comp./memory cost grows quickly with circuit size

Traditional iterative methods for HB analysis (P. Feldmann et al, CICC’96, W. Dong et al, TCAD’09)

– Pros: black-box, matrix-oriented, memory-efficient– E.g. ILU preconditioner, domain-decomposition preconditioner

– Cons: inefficient/unreliable for strongly nonlinear RF systems

=Γ⋅⋅Γ −

12

1

21

1

GGG

GGGGG

G

s

s

s

=

sg

gg

G

2

1

TsGGG ],,,[ 21

Tsggg ],,,[ 21

FFT

Dense circulant matrices due to FFT/IFFT operations

26

From graph sparsification to Jacobian matrix sparsification– Modified nodal analysis (MNA) matrix reduction: 20% ~ 38% fewer entries

– Fill-ins during LU reduction: 60% LU factorization Speedup: 50X

Graph Sparsification Approach to HB Analysis

• • • • • ⇒• • • • • • •

MNA MatrixHB Jacobian Matrix

• × • • • × • × ⇒× × • × × • • × • • × × • × •

Fill-ins during LUBlock Fill-ins during LU

Before Graph Sparsification

• • • • ⇒• • • • •

MNA Matrix

HB Jacobian Matrix

• × • • • ⇒• × • • • × •

Fill-ins during LUBlock Fill-ins during LU

After Graph Sparsification

27

Conclusion Graph sparsification approaches to circuit simulations

– MNA matrix decomposition into Laplacian and Complement matrices

– Performance-guided graph sparsification of Laplacian matrix

– Support-circuit preconditioner construction

Our preliminary results– Highly reliable convergence for time/frequency domain simulations

– Up to 18X (21X) speedup and 7X (6X) memory reduction for time (frequency) domain simulations

– Scalable to large post-layout integrated circuits

Future work– Will explore spectral graph sparsification methods

– Will exploit heterogeneous CPU-GPU computing platforms

28

Nonlinear Devices Evaluation in HB

Evaluation of nonlinear devices Freq->Time: terminal voltage waveformsTime domain: evaluate current (derivative) waveformsTime->Freq: currents(derivatives) in freq. domain

Terminal voltage spectrum

IFFT/IAPDFT

Terminal voltage samples

Device evaluation Ids

samples

FFT/APDFT(Almost-Periodic DFT)

Ids spectrum

Terminal voltage samples– Need sampling at 2k+1 time points (k is the positive frequencies number)

according to Nyquist–Shannon sampling theorem.

29

Support-Circuit Preconditioner for HB Analysis Step 1: MNA matrix decomposition of linearized RF circuit

– Laplacian Matrix (P): passive devices such as resistors, capacitors, etc– Complement Matrix (A): active devices such as transconductances, etc

M1

L1

R1L2C2

C1

R2

RF Circuit

Linearized Circuit at t1

Linearized Circuit at ts

. . .

P t1

A t1

L1

R1L2C2

C1Cgd

Cgs gdsCgs

gmVgs

R2

1 23

4

5

L1

R1L2C2

C1Cgd

Cgs gdsCgs

gmVgs

R2

1 23

4

5

P ts

A tst1~ts are s time sampled time points

30

Support-Circuit Preconditioner for HB Analysis (2) Step 2: Representative Laplacian matrix construction

– Different sampled time points have different entry values– Normalize the scaled Laplacian matrices of all sampled time points

…

P t1 P t2 P ts

Representative Laplacian Matrix

Normalize Average

31

Support-Circuit Preconditioner for HB Analysis (3)

g1+C2/h

5

2

gds+Cds/h

C1/hCgd/h

31

4g2

Cgs/h

Representative Laplacian Matrix Original Weighted Graph Ultra Sparsifier

C1/hCgd/h

31

4g2

5

2

g1+C2/h

gds+Cds/h

Sparsified Representative Laplacian Matrix

Complement MatrixSparsification pattern Matrix

Step 3: Sparsification Pattern Extraction– Convert matrix to weighted graph– Sparsify the weighted graph and convert back to matrix form– Combine with the complement matrix

32


System MNA Matrix t1

Sparsification pattern Matrix

System MNA Matrix t2

System MNA Matrix ts

Sparsified SystemMNA Matrix t1

Sparsified system MNA Matrix t2

Sparsified system MNA Matrix ts

… …

Step 4: MNA Matrix Sparsification

33Support circuit preconditionerPermuted matrix

Circulant matrix in HB

Step 5: Support circuit block preconditioner generation– Original matrix : all variables of a single harmonic grouped together

– Permuted matrix: all the harmonics of a single variable grouped together


=Γ⋅⋅Γ −

12

1

21

1

GGG

GGGGG

G

s

s

s

=

sg

gg

G

2

1

TsGGG ],,,[ 21

Tsggg ],,,[ 21

FFT

Permutation FFT

Sparsified MNA matrix

34

Case Study : Double-balanced Gilbert Mixer MOSFET linearization model

[21]

[2]

[1] [8]

[16]

[25] [27]

[20] [7]

[15]

[13] [14]

[11] [18]

[22][17]

[4] [6]

M2M1

R7

M5

L1

L0

C0

Vlo+M3 M4

M6

R1

R3

R8

L2

R10

L3C1

R2

Vrf+ R5 Vrf-R6

Vlo-R4

VDD

[1] [8]

[21] [16]

[25] [27]

[20] [7]

[15]

[26]

[13] [14][11] [18]

[22][17]

[4] [6]

[2]

Linearized passive network (Laplacian matrix) extraction

RdsgmVgs gnVbs

D

S

G

B

Cgd

CgsG

B

S

D

[xx] denotes node index

35

Case Study : Double-balanced Gilbert Mixer (cont.) Ultra-sparsifier support graph construction

– Step 1: Extract maximum spanning tree

– Step 2: Restore critical edges until reaching a desired approximation

2

4 6

8 11

13 14

1 18

1621 17 22

25 27

2

4 6

8 11

13 14

1 18

1621 17 22

25 27

2

4 6

8 11

13 14

1 18

1621 17 22

25 27

Laplacian graph Maximum spanning tree Ultra sparsifier

36

HB Simulation Engine on CPU-GPU Platform

Device evaluation

Support-circuitpreconditioner

Preconditionerfactorization

GMRES iterations

Convergence checking

Start

End

NR

Decompose MNA matrix to Passive and active matrices

1. Performance modeling based sparsification configuration

2. Construct representative passive matrix

3. Extract sparsification pattern4. Sparsify MNA Matrix5. Generate Support-circuit

preconditioner

GPU-based block LU decomposition

Matrix-free iterative solver

37

Runtime Performance Modeling Lookup table (LUT) for runtime performance modeling

– 2D LUTs predict LU factorization runtime on GPU

– Two LUTs are created for GPU matrix multiplications and matrix divisions

Runtime performance lookup table for GPU-based matrix operations

Matrix operation batch size

Matrix size

Bilinear interpolation

38

Parallel Sparse Block LU Factorization Representative Sparsified MNA Matrix (test matrix)

– Approximates the properties of block sparse matrix– Created by averaging all sparsified MNA matrices– Factorized to get the fill-ins’ locations

…

Test matrix

Average

Sparsified SystemMNA Matrix t1

Sparsified system MNA Matrix t2

Sparsified system MNA Matrix ts

x

Fill-in

x

xx

x

LU L factor

U factor

39

Parallel Sparse Block LU Factorization (cont.) Data dependency graph

– Column k depends on column j, when U(j, k) != 0 [1]

– Can be derived from U matrix

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

0

02 1 0 6

4 5 3

7

8

9

Level 0

Level 1

Level 2

Level 3

Level 4

[1] J. Gilbert and T. Peierls. Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Sci. Stat. Comput., 9(5):862–873, 1988.

40

Parallel Sparse Block LU Factorization (cont.) Modified data dependency graph

– Identify “fake” dependency when L(j+1:n, j) == 0– Eliminate “fake” dependencies

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

0

0

2 1 0 6

4 5 37

89

Level 0

Level 1

Level 2

2 1 0 6

4 5 3

7

8

9

Level 0

Level 1

Level 2

Level 3

Level 4

41

Parallel Sparse Block LU Factorization (cont.) GPU-based block sparse

matrix LU factorizations– Levelize the factorization

according to data dependency graph

– Each level only contains matrix multiplication and division operations

– Use batched matrix multiplication and inversion functions provided by CUBLAS

2 1 0 6

4 5 37

89

Level 0

Level 1

Level 2

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

…Level 0

Level n

Result

×X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

…

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

…

× ×

…

42

Experiment Setup

Note:• Freqs: Number of harmonics• Nunk: Number of unknowns

CKT Name Nodes Tones Freqs Nunk1 mixer 1 302 2 25 147982 mixer 2 1988 2 41 1610283 mixer 3 5262 2 5 473584 mixer 4 7532 2 13 1883005 LNA + mixer 1 343 3 63 428756 LNA + mixer 2 5303 3 14 1431817 LNA + mixer 3 7573 3 14 204471

Widely used RF circuits as the benchmark

43

Support-circuit preconditioned HB (SCPHB) method– High robustness and efficiency

– Runtime speedup: 21X (compared with direct solver in DAC’09)

– Memory reduction: 6X (compared with direct solver in DAC’09)

Runtime and Memory Efficiency on CPU

CKTDirect solver BD preconditioner SCPHB preconditioner

Time(s) Mem(GB) Time(s) K-Its Time(s) Mem(GB) K-Its Speedup

1 471.9 0.23 24.9 821 145.5 0.10 204 3.24X

2 19263.1 7.95 5637.6 6731 1408 1.72 383 13.7X

3 686.4 0.36 92.2 165 69.5 0.06 229 9.8X

4 14153.5 4.26 1072.3 273 1035.6 0.73 355 21.3X

5 2561.6 1.92 DNF DNF 821.5 1 194 3.1X

6 4040.9 3.34 DNF DNF 414.7 0.67 328 9.74X

7 6633.6 5.21 DNF DNF 791 0.83 255 8.38X

K-Its : GMRES iteration number; DNF : Do not finish within 1000 Newton iterations

44

Simulation runtime VS. input power of LNA+Mixer– BD preconditioner: runtime increases exponentially

– SCPHB preconditioner: runtime remains nearly constant

Runtime Efficiency for Strongly Nonlinearities

45

Scalability Nearly-linear runtime and memory scalability

(a) Runtime scalability (b) Memory scalability

Date post:	08-Aug-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph...

Documents