Mid-Report of McSMLoplab/seminar/students/slides/seol.pdf · 3 / 21 How to Solve Linear Equations...

Solving Linear Equationsin Interior-Point Methods

Tongryeol SeolComputing and Software Department, McMaster University

March 1, 2004

Optimization Seminar

1 / 21

Outline

• Linear Equations in IPMs

• How to Solve Sparse Linear Equations

• Introduction to McSML

• Conclusion

• Reference

2 / 21

Linear Equations in IPMs for LO and QO

• The key implementation issue of IPMs is the solution of the linear system of equations arising in the Newton system:

• At every iteration, we solve the above system with different H.

• Solving the equations takes in the average 60~90% of the time ofsolving problems by IPMs.

hLΔy0A

fL=

ΔxAT–H–1

AHAT Δy = AHfL + hL

augmented system in LO

normal equation in LO

hQΔy0A

fQ=

ΔxAT– Q – H–1

A(Q+ H–1)–1AT Δy = A(Q+H–1)–1 fQ + hQ

augmented system in QO

normal equation in QO

3 / 21

How to Solve Linear Equations in IPMs

Direct Methods• Cholesky factorization (LLT) is most popular for normal equation

approach.– Cholesky factorization is a symmetric variant of LU factorization. – Data structures of L can be determined previously and fixed.

• LDLT factorization is used for augmented system approach.– D is a block diagonal matrix if 2x2 pivot is applied, otherwise just a

diagonal matrix.

Iterative Methods• Conjugate gradient method is considered as an alternative in IPMs

for network flow optimization.

4 / 21

Normal Equation Approach

• In IPMs for LO, normal equation is popular because AHAT is positive defnite and H is a diagonal matrix.

• If there is one or more dense columns in A, AHAT loses sparsity.

• normal equation of QO?

AHAT Δy = AH fL + hL

normal equation in LO

A(Q+H–1)–1AT Δy = A(Q+H–1)–1 fQ + hQ

normal equation in QO

nonzero pattern of A of fit1p(nz=9,868)

nonzero pattern of AAT of fit1p(nz=393,129)

5 / 21

Augmented System Approach

• Augmented system is free from dense columns or Q matrix, but loses positive definiteness of normal equation.

• There are two approaches to solve augmented system:

Symmetric Block Factorization with 2x2 pivoting– Theory: Bunch-Parlett (or Bunch-Kaufman) Pivoting Strategy– Implementation: LINPACK, Harwell Library (MA27, MA47),

Fourer and Mehrotra (fo1aug)

LDLT Factorization with Regularization– Theory: Quasidefinite, Proximal Point Algorithm– Implementation: Mészáros, Gondzio

6 / 21

Block Factorization with 2× 2 Pivoting

• We cannot apply the Cholesky factorization to indefinite matrices.– e.g

• Bunch-Kaufman pivoting strategy uses 2×2 pivot when 1×1 pivot is not acceptable.– At first, it tries a partial pivoting search (case 1~3).– If no 1×1 pivot is acceptable, it performs 2×2 pivot.

0110

λ

d

σc

σ

λ

λ has the largest absolute value in this columnσ has the largest absolute value in this column

case 1. |d| ≥ α |λ|,

case 2. |dσ| ≥ α |λ|2 ,

case 3. |c| ≥ α |σ | ,

case 4. Otherwise,

use d as a 1×1 pivot.

use d as a 1×1 pivot.

use c as a 1×1 pivot.

use as a 2×2 pivot.d λλ c

α ≒ 0.6404

7 / 21

Solving Linear Equations by Regularization

• Quasidefinite matrices are strongly factorizable.– A symmetric matrix K is called quasidefinite if it has the form

where H and F are positive definite and A has full row rank.– Factorizable: There exists D and L such that K = LDLT.– Strongly Factorizable: For any permutation matrix P, PKPT = LDLT exists.

• Regularization can be applied to achieve stability in the linearalgebra kernel.– Primal-dual regularization:

– However, we keep the regularizations small since we don’t want to change greatly the behaviour of IPMs.

– H AT

A FK =

– Q – H–1 AT

A 0M = + – Rp 00 Rd makes the part more positive definite

makes the part more negative definite

8 / 21

Sparse Gaussian Elimination

• First step of Gaussian elimination:

– Repeat Gaussian elimination on C … Result in a Factorization M = LU.– In symmetric cases, M = L (U = DLT ) = LDLT.

• For sparse M:

Cv=

wTα

Iv/α

01

C –0

wTα

αvwTM =

X

X

X

X

X

X

X

X

X

X

X

X

XXXX

0

0

0

0

0

X

XXXXX

XXXXX

X

X

X

X

XXXX

XXXX

XXXX

XXXX

X

X

XX

XX

X XXXX

XX

XX

X

0

0

0

0

0

X

XX

XX

X XXXX

XX

XX

X

If v and w are not zero,

is not zero when C is zero.C – αvwT

Different order of pivotingcan reduce the number of fill-ins.

unit lower triangular matrix

upper triangular matrix

diagonal matrix

9 / 21

Control of Nonzeros (Fill-ins)

• When numerical stability is not an issue,ordering, symbolic factorization, can be performed separately.numerical factorization

• Ordering permutes equations and variables to reduce fill in L.– Finding Optimal Permutation is NP-complete problem.– Local heuristics (e.g minimum degree ordering)– Global heuristics (e.g nested dissection)

• Symbolic factorization determines locations of nonzeros in L, and we can set up data structures for storing nonzeros of L previously.

10 / 21

Local Ordering Heuristics

• The Markowitz criterion in Gaussian elimination

• Minimum degree ordering is a symmetric variant of Markowitz ordering.

• Graph representation of the sparse pattern of a matrix- Nodes: on-diagonal elements- Edges: off-diagonal elements

X

X

X

Ⓧ

■■■■

■■■■

■

X

■■■

XXX

X

X

X

Ⓧ

■■■

■■■

■

X

■■

XX

Markowitz count: (ri-1)(cj-1)ri

cj

estimated # of elements to be nonzeros

Markowitz count: (ri-1)2

ri Degree: r

② ① ③

④ ⑤X

X

X

①

X③

X④

X

X

⑤X

②

XX

11 / 21

M12

M11

M22

M21

Global Ordering Heuristics

• Nested dissection ordering– Its roots are in finite-element substructuring.– It makes a matrix doubly-bordered block diagonal form.– It finds a small separator to disconnect a given graph into components of

approximately equal size.

M2

M1M1 M2

M11 M21

M12 M22

9×9 grid 81×81 matrix

12 / 21

Notion of Supernode

• Consecutive columns with identical sparsity structure can often be found in L.

• Supernode is a group of consecutive columns { j, j+1, …, j+t } such that– columns j to j+t have a dense diagonal block,– columns j to j+t have identical sparsity pattern below row j+t

• Benefits of Supernode– Reduce inefficient indirect addressing,– Take advantage of cache memory,– Allow a compact representation of the

sparsity structure of L– We can expect about 30% speed-up by

using supernode structures.

13 / 21

Introduction to McSML

• McSML consists of 11 C–MEX files for solution of linear equations in IPMs.

• ANALYSIS – Minimum (External) Degree Ordering with Multiple Elimination

• FACTORIZE and SOLVE– Left-Looking Cholesky/LDLT Factorization– Supernodal Techniques: Compact Row Indices, Loop-unrolling, etc– Iterative Refinement: Conjugate Gradient Method

• MISC.– We maintain A and AT together.– Normal Equation Builder and Augmented System Builder.

14 / 21

Normal Equation Solver of McSML

AT=sm_trans(A);

P=sf_mmd_ne(A);

[L,SINFO]=sf_scc_ne(A,AT,P);

D=nf_aat(A,AT,H,P,L,SINFO);

nf_scc_ne(L,SINFO,D);

x=nf_sub_ne(A,H,L,SINFO,D,P,rhs);

It performs minimum degree ordering. P has ordering info.

It builds data structures for L and create supernode structures(SINFO).

It computes AHAT.

It performs numerical factorization.

It solves the equation (AHATx=rhs) and refines the solution.

We assume that every matrix is sparse and every vector is full.

15 / 21

Augmented System Solver of McSML

AT=sm_trans(A);

P=sf_mmd_as(A,AT,H,F);

[L,SINFO]=sf_scc_as(A,AT,H,F,P);

D=nf_aug(A,AT,H,F,P,L,SINFO,Rp,Rd);

nf_scc_as(L,SINFO,D);

x=nf_sub_as(A,AT,H,F,L,SINFO,D,P,rhs);

It performs numerical factorization.

It solves the equation and refines the solution.

It builds + .H AT

A F-Rp 00 Rd

16 / 21

Performance of McSML – Accuracy Benchmarking

McSMLne

7.620719E-092.026405E-121.120950E-044.664558E-082.108489E-091.824042E-13qap15

3.458121E-072.185784E-154.624334E-073.715834E-153.656559E-065.607903E-15cre_d

6.074663E-072.963212E-132.726310E-071.235207E-135.722383E-072.846440E-13ken_13

3.723433E-081.302376E-141.846481E-085.002513E-153.371902E-089.263912E-15pds_20

4.836873E-092.762969E-125.972575E-063.819078E-092.697597E-084.186702E-12qap12

1.229784E-084.556325E-151.540756E-082.779174E-152.471212E-084.614586E-15pds_10

3.667600E-041.608902E-132.755767E-021.062017E-112.436846E-029.793590E-12osa_60

5.262877E-071.994308E-155.997905E-071.994308E-152.453067E-042.446411E-13cre_b

8.061074E-092.118858E-136.462990E-091.850241E-137.965795E-092.457402E-13stocfor3

4.792537E-102.895742E-154.535023E-103.474891E-154.963837E-102.779912E-15maros_r7

7.146214E-066.693688E-147.685944E-064.457410E-141.264779E-053.434812E-14dfl001

Sol. Error 2Sol. Error 1Sol. Error 2Sol. Error 1Sol. Error 2Sol. Error 1

5.873346E-14

1.323670E-14

3.993961E-16

2.874662E-12

1.613404E-14

1.774429E-15

2.700201E-12

3.186449E-14

1.927925E-15

MATLAB(LAPACK-DPOTRF)

5.723799E-053.308398E-033.061775E-03osa_30

2.168695E-102.526719E-103.405521E-10pilot87

6.414922E-121.293002E-111.305830E-1180bau3b

LIPSOL(ORNL Cholesky Solver)

Selected Netlib Problems, IBM RS/6000 W/S||b-AATx||∞

||b||∞||b-AATx||2Solution Error 1: , Solution Error 2:

17 / 21

Performance of McSML – Speed Benchmarking

2,816.78 140.692.981,414.06 70.239.466,299.51 314.911.31qap15

30.21 1.470.8135.37 1.662.172,934.20 146.631.60cre_d

7.42 0.291.6212.01 0.560.8116.21 0.770.81ken_13

1,326.40 65.928.00708.08 24.90210.0816,362.97 818.061.77pds_20

332.36 16.580.76176.12 8.731.521,043.80 52.170.40qap12

161.49 7.991.6988.78 3.1625.58193.81 9.660.61pds_10

32.93 1.512.73125.86 3.6552.86100.66 2.4651.46osa_60

41.06 2.010.8639.141.871.744,367.50 218.291.70cre_b

2.81 0.110.619.15 0.450.1511.25 0.540.45stocfor3

49.52 2.450.5250.40 2.500.40117.15 5.791.35maros_r7

172.01 8.560.8169.153.401.151,086.69 54.320.29dfl001

Total*NumericSymbolicTotal*NumericSymbolicTotal*NumericSymbolic

12.32

18.67

1.50

0.92

0.27

0.10

McSMLne

1.50

1.02

0.11

28.82

41.73

1.93

9.82

0.73

0.13

MATLAB(LAPACK-DPOTRF)

0.5739.95 9.950.95osa_30

0.9221.22 0.822.05pilot87

0.072.29 0.090.0980bau3b

LIPSOL(ORNL Cholesky Solver)

*Total = Sym. Comp. Time + 20 * Num. Comp. Time (in second) Selected Netlib Problems, IBM RS/6000 W/S

18 / 21

Performance of McSML – N.E. vs. A.S.

Selected Netlib Problems, IBM RS/6000 W/S

42.57

8.25

424.81

82.78

4898.74

762.97

65.91

83.20

84.11

13.72

2.14

3.52

15.52

2.26

Ord.*

McSMLas

0.85

0.24

0.92

0.26

1.15

0.34

0.09

0.21

0.24

0.04

0.09

0.20

0.18

0.02

S. F.

0.28

0.07

0.33

0.09

0.23

0.09

0.04

0.08

0.09

0.02

0.03

0.05

0.07

0.01

AS

117.76

14.49

69.44

7.88

1.83

0.64

0.28

1.64

2.10

0.08

0.94

2.01

7.71

0.03

N. F.

1.79 (05)

2.62 (37)

3.59 (10)

1.10 (10)

1.03 (03)

0.41 (03)

0.64 (11)

1.30 (14)

1.39 (14)

0.20 (07)

0.12 (04)

0.22 (03)

0.97 (15)

0.03 (03)

S. R.

4.787706E+00

4.003021E-10

1.481893E-09

5.471392E-10

4.062912E-08

1.095287E-08

6.276160E-10

6.666891E-10

5.002931E-10

3.719834E-09

3.719834E-09

4.098304E-10

2.476533E-07

8.280160E-11

Accuracy

142.14

16.67

65.80

7.84

0.51

0.16

0.19

1.26

1.71

0.06

0.77

2.17

8.54

0.02

N. F.

0.27

0.07

0.50

0.15

0.21 (2)

0.09 (2)

0.05

0.08

0.09

0.02

0.01

0.04

0.12

0.01

S. R.

0.99

0.23

0.70

0.19

1.01

0.30

0.06

0.16

0.17

0.04

0.06

0.19

0.19

0.01

S. F.

7.620719E-090.391.83qap15

3.458121E-070.090.47cre_d

6.074663E-070.050.37ken_13

3.723433E-080.454.58pds_20

4.836873E-090.100.42qap12

1.229784E-080.111.01pds_10

3.667600E-040.561.35osa_60

5.262877E-070.100.50cre_b

8.061074E-090.030.15stocfor3

4.792537E-100.220.28maros_r7

7.146214E-060.090.51dfl001

AccuracyNEOrd.

5.723799E-05

2.168695E-10

6.414922E-12

0.52

0.15

0.03

McSMLne

0.22osa_30

0.10pilot87

0.0080bau3b

* Tentative version of minimum degree ordering for augmented system

19 / 21

Performance of McSML – McIPM with McSML

5.89444E-08

7.39163E-09

9.22093E-08

2.85713E-08

1.21127E-09

7.17196E-10

6.31605E-08

1.34671E+00

6.13231E-08

1.81016E-08

1.88798E-11

3.89770E-09

2.17643E-09

1.43719E-10

Duality Gap

9.61793E-08

2.00629E-11

2.59837E-07

1.59447E-11

6.11396E-12

3.60120E-09

1.30478E-05

3.17591E+02

4.65285E-05

1.14794E-10

3.97911E-09

3.08955E-09

3.44373E-09

3.33908E-09

Dual Feas.

2.65988E-09

7.46069E-09

7.69573E-09

2.85714E-08

1.21127E-09

7.28516E-10

9.67817E-12

2.33736E-11

2.96888E-11

1.80975E-08

1.88593E-11

3.89769E-09

2.17643E-09

1.43661E-10

Duality Gap

1.99680E-10

2.02139E-11

1.03668E-08

1.59437E-11

8.48931E-13

4.24861E-09

1.75965E-09

1.15036E-09

6.43470E-10

1.14724E-10

3.97786E-09

2.93585E-09

3.31878E-09

7.34164E-10

Dual Feas.

1.51093E-060.9823*2.07036E-121.0723lotfi

3.27941E+030.743*8.44625E-090.4715scagr7

1.69120E-060.4418*2.28081E-100.4715stocfor1

5.82778E-050.3613*8.67057E-090.4012share2b

7.08901E-090.77177.15516E-091.0217vtp_base

1.44868E-080.45121.44868E-080.5312recipe

1.08636E-100.48137.68271E-120.5113sc205

6.31986E-050.5521*8.43804E-100.4614adlittle

1.32147E-090.31121.31704E-090.3812sc105

1.19769E-090.23111.06181E-090.3011sc50a

2.11146E-090.24101.95050E-090.2710sc50b

Primal Feas.TimeIt.Primal Feas.TimeIt.

0.81

0.41

0.20

1.04876E-09

2.94229E-11

8.93729E-10

11

17

10

McIPM with LIPSOL’s Equation Solver

1.65276E-09110.35blend

2.96139E-11170.48kb2

3.75415E-09100.67afiro

McIPM with McSMLne

Selected Netlib Problems, IBM RS/6000 W/S* Numerical difficulty happens.

20 / 21

Conclusion and Future Work

• We implemented sparse linear equation solvers for IPMs.

• The McIPMne with most of netlib problems is competitive to LIPSOL’s linear equation solver but slower with some bigproblems. We need to improve ordering quality.

• The McIPMas is not numerically stable yet. Refinement by PCGdoesn’t work well either. We need to 1check operations in factorization and substitution and 2implement Bunch-Parlettmethod.

• McIPM with McSML fails due to numerical instability of the last iterations (H=X-1Z has big range in value).

21 / 21

Selected References

Books• Duff, I. S., Erisman, A. M. and Reid, J. K. (1989) Direct Methods for Sparse Matrices, Oxford University Press,

New York• George, A. and Liu, J. W. H. (1981) Computer Solution of Large Sparse Positive Definite Systems, Prentice Hall,

Inc., Englewood Cliffs

Papers• Gondzio, J. (1993) Implementing Cholesky Factorization for IPMs of LP, Optimization• Altman, A. and Gondzio, J. (1998) Regularized Symmetric Indefinite Systems in IPMs for Linear and Quadratic

Optimization, Opt. Methods & Soft.

• Mészáros, C. (1996) Fast Cholesky Factorization for IPMs of LP, Comp. & Math. Appl.• Mészáros, C. (1997) The augmented system variant of IPMs in two-stage stochastic LP computation, EJOR• Maros, I., and Mészáros, C. (1998) The Role of the Augmented System in IPMs, EJOR

• Vanderbei, R. J. (1995) Symmetric Quasidefinite Matrices, SIAM J. Opt.

• Liu, J. W. H. (1990) The Multifrontal Method for Sparse Matrix Solution: Theory and Practice, SIAM Review• Fourer, R. and Mehrotra, S. (1993) Solving Symmetric Indefinite Systems in an IPM for LP, Math. Prog.• Duff, I. S. and Reid, J. K. (1995) Exploiting Zeros on the Diagonal in the Direct Solution of Indefinite Sparse

Symmetric Linear Systems, ACM Tran. Math. Soft.

Date post:	08-Mar-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Mid-Report of McSMLoplab/seminar/students/slides/seol.pdf · 3 / 21 How to Solve Linear Equations...

Documents