Preconditioning Techniques for Large Linear Systems
Part III: General-Purpose Algebraic Preconditioners
Michele Benzi
Department of Mathematics and Computer Science
Emory University
Atlanta, Georgia, USA
Scuola di Dottorato di Ricerca in Scienze Matematiche
Dipartimento di Matematica
Università degli Studi di Padova
1
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
2
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
3
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
4
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
Preconditioning may be viewed as a transformation:
M−1Ax = M−1b, or AM−1y = b, x = M−1y
4
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
Preconditioning may be viewed as a transformation:
M−1Ax = M−1b, or AM−1y = b, x = M−1y
Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR);Incomplete Factorizations; Sparse Approximate Inverses; AMG ...
4
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
Preconditioning may be viewed as a transformation:
M−1Ax = M−1b, or AM−1y = b, x = M−1y
Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR);Incomplete Factorizations; Sparse Approximate Inverses; AMG ...
preconditioner M (or M−1) should be cheap, fast to compute, andresult in rapid convergence of the preconditioned iterative method
4
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
Preconditioning may be viewed as a transformation:
M−1Ax = M−1b, or AM−1y = b, x = M−1y
Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR);Incomplete Factorizations; Sparse Approximate Inverses; AMG ...
preconditioner M (or M−1) should be cheap, fast to compute, andresult in rapid convergence of the preconditioned iterative method
but also: sufficiently robust
4
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
Preconditioning may be viewed as a transformation:
M−1Ax = M−1b, or AM−1y = b, x = M−1y
Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR);Incomplete Factorizations; Sparse Approximate Inverses; AMG ...
preconditioner M (or M−1) should be cheap, fast to compute, andresult in rapid convergence of the preconditioned iterative method
but also: sufficiently robust
sparse (i.e., low storage requirements)
4
Preconditioned iterative methods
Solving large linear systems by Krylov-type methods
Ax = b
Preconditioning may be viewed as a transformation:
M−1Ax = M−1b, or AM−1y = b, x = M−1y
Examples: Matrix Splittings (block Jacobi, Gauss-Seidel, SSOR);Incomplete Factorizations; Sparse Approximate Inverses; AMG ...
preconditioner M (or M−1) should be cheap, fast to compute, andresult in rapid convergence of the preconditioned iterative method
but also: sufficiently robust
sparse (i.e., low storage requirements)
The case of sequences of linear systems A(k)x(k) = b(k),k = 0, 1, 2, . . .
4
Preconditioned iterative methods
Structure of this lecture:
5
Preconditioned iterative methods
Structure of this lecture:
1 Brief discussion of algebraic vs. problem-specific preconditioning
5
Preconditioned iterative methods
Structure of this lecture:
1 Brief discussion of algebraic vs. problem-specific preconditioning
2 Description of guiding principles behind algebraic preconditioning(IF and SAI). Robustness problems of standard techniques
5
Preconditioned iterative methods
Structure of this lecture:
1 Brief discussion of algebraic vs. problem-specific preconditioning
2 Description of guiding principles behind algebraic preconditioning(IF and SAI). Robustness problems of standard techniques
3 Some recent approaches which exploit info on matrix inverse
5
Preconditioned iterative methods
Structure of this lecture:
1 Brief discussion of algebraic vs. problem-specific preconditioning
2 Description of guiding principles behind algebraic preconditioning(IF and SAI). Robustness problems of standard techniques
3 Some recent approaches which exploit info on matrix inverse
4 An approach based on a novel decomposition of the input matrix
5
Preconditioned iterative methods
Structure of this lecture:
1 Brief discussion of algebraic vs. problem-specific preconditioning
2 Description of guiding principles behind algebraic preconditioning(IF and SAI). Robustness problems of standard techniques
3 Some recent approaches which exploit info on matrix inverse
4 An approach based on a novel decomposition of the input matrix
5 Other recent developments: hybrid and multi-level methods (briefly)
5
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
6
A quote
In ending this book with the subject of preconditioners, we find ourselvesat the philosophical center of the scientific computing of the future...Nothing will be more central to computational science in the next centurythan the art of transforming a problem that appears intractable intoanother whose solution can be approximated rapidly. For Krylov subspacematrix iterations, this is preconditioning.
From N. L. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM,1997.
7
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
Incomplete factorizations: M = LU ≈ A
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
Incomplete factorizations: M = LU ≈ A
Approximate inverse preconditioners: G = M−1 ≈ A−1
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
Incomplete factorizations: M = LU ≈ A
Approximate inverse preconditioners: G = M−1 ≈ A−1
Algebraic Multi-Grid (AMG).
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
Incomplete factorizations: M = LU ≈ A
Approximate inverse preconditioners: G = M−1 ≈ A−1
Algebraic Multi-Grid (AMG).
Hybrids obtained by combining some of the above
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
Incomplete factorizations: M = LU ≈ A
Approximate inverse preconditioners: G = M−1 ≈ A−1
Algebraic Multi-Grid (AMG).
Hybrids obtained by combining some of the above
Such preconditioners are good candidates for inclusion in general-purposesoftware packages. Although they may not be “optimal” for almost anyproblem, they are widely applicable and have proven to be reasonablyrobust in countless applications.
8
Algebraic vs. Problem-Specific Preconditioning
Algebraic preconditioners only use information extracted from the inputmatrix A, usually supplemented by some user-provided tuning parameters,like drop tolerances or limits on the amount of fill-in allowed.
Main examples include:
Preconditioners based on classical (block) splittings A = M −N
Incomplete factorizations: M = LU ≈ A
Approximate inverse preconditioners: G = M−1 ≈ A−1
Algebraic Multi-Grid (AMG).
Hybrids obtained by combining some of the above
Such preconditioners are good candidates for inclusion in general-purposesoftware packages. Although they may not be “optimal” for almost anyproblem, they are widely applicable and have proven to be reasonablyrobust in countless applications.Also, they are being continually improved.
8
Algebraic vs. Problem-Specific Preconditioning
Discretization of a continuous problem (a system of PDEs, an integralequation, etc.) leads to a sequence of linear systems Anxn = bn where An
is n× n and n→∞ as the discretization is refined (that is, “as h→ 0 ”).
9
Algebraic vs. Problem-Specific Preconditioning
Discretization of a continuous problem (a system of PDEs, an integralequation, etc.) leads to a sequence of linear systems Anxn = bn where An
is n× n and n→∞ as the discretization is refined (that is, “as h→ 0 ”).
Definition: A preconditioner is optimal if it results in a rate ofconvergence of the preconditioned iteration that is asymptotically constantas the problem size increases, and if the cost of each preconditionediteration scales linearly in the size of the problem.
9
Algebraic vs. Problem-Specific Preconditioning
Discretization of a continuous problem (a system of PDEs, an integralequation, etc.) leads to a sequence of linear systems Anxn = bn where An
is n× n and n→∞ as the discretization is refined (that is, “as h→ 0 ”).
Definition: A preconditioner is optimal if it results in a rate ofconvergence of the preconditioned iteration that is asymptotically constantas the problem size increases, and if the cost of each preconditionediteration scales linearly in the size of the problem.
For integral equations, the scaling of each iteration may be O(n log n) orsomething like that.
9
Algebraic vs. Problem-Specific Preconditioning
Discretization of a continuous problem (a system of PDEs, an integralequation, etc.) leads to a sequence of linear systems Anxn = bn where An
is n× n and n→∞ as the discretization is refined (that is, “as h→ 0 ”).
Definition: A preconditioner is optimal if it results in a rate ofconvergence of the preconditioned iteration that is asymptotically constantas the problem size increases, and if the cost of each preconditionediteration scales linearly in the size of the problem.
For integral equations, the scaling of each iteration may be O(n log n) orsomething like that.
For example, in the SPD case if κ2(M−1n An) ≤ C where C is some
constant independent of n, then Mn is an optimal preconditioner if theaction of M−1
n An on a vector can be computed in O(n) work.
9
Algebraic vs. Problem-Specific Preconditioning
In contrast, problem-specific preconditioners, which are designed to solve anarrow class of problems, are often optimal. These methods makeextensive use of the developer’s knowledge of the application at handincluding information about the physics, the geometry, and the particulardiscretization technique used.
10
Algebraic vs. Problem-Specific Preconditioning
In contrast, problem-specific preconditioners, which are designed to solve anarrow class of problems, are often optimal. These methods makeextensive use of the developer’s knowledge of the application at handincluding information about the physics, the geometry, and the particulardiscretization technique used.
These preconditioners are usually not suitable for other types of problems,so their range of applicability is limited.
10
Algebraic vs. Problem-Specific Preconditioning
In contrast, problem-specific preconditioners, which are designed to solve anarrow class of problems, are often optimal. These methods makeextensive use of the developer’s knowledge of the application at handincluding information about the physics, the geometry, and the particulardiscretization technique used.
These preconditioners are usually not suitable for other types of problems,so their range of applicability is limited.
Many PDE-based (or physics-based) preconditioners belong to this class.An example is Diffusion Synthetic Acceleration (DSA) in radiationtransport.
10
Algebraic vs. Problem-Specific Preconditioning
In contrast, problem-specific preconditioners, which are designed to solve anarrow class of problems, are often optimal. These methods makeextensive use of the developer’s knowledge of the application at handincluding information about the physics, the geometry, and the particulardiscretization technique used.
These preconditioners are usually not suitable for other types of problems,so their range of applicability is limited.
Many PDE-based (or physics-based) preconditioners belong to this class.An example is Diffusion Synthetic Acceleration (DSA) in radiationtransport.
Other examples of problem-specific preconditioners, especially forincompressible flow problems, will be discussed later in these lectures.
10
Algebraic vs. Problem-Specific Preconditioning
The two approaches, algebraic and problem-specific, are not necessarilymutually exclusive—similar to ‘direct vs. iterative methods’.
11
Algebraic vs. Problem-Specific Preconditioning
The two approaches, algebraic and problem-specific, are not necessarilymutually exclusive—similar to ‘direct vs. iterative methods’.
Most problem-specific preconditioners use algebraic ones as buildingblocks, e.g., to solve or to approximate subproblems arising within theoverall preconditioning strategy.
11
Algebraic vs. Problem-Specific Preconditioning
The two approaches, algebraic and problem-specific, are not necessarilymutually exclusive—similar to ‘direct vs. iterative methods’.
Most problem-specific preconditioners use algebraic ones as buildingblocks, e.g., to solve or to approximate subproblems arising within theoverall preconditioning strategy.
Some algebraic preconditioners are flexible enough that they can betailored to specific applications.
11
Algebraic vs. Problem-Specific Preconditioning
The two approaches, algebraic and problem-specific, are not necessarilymutually exclusive—similar to ‘direct vs. iterative methods’.
Most problem-specific preconditioners use algebraic ones as buildingblocks, e.g., to solve or to approximate subproblems arising within theoverall preconditioning strategy.
Some algebraic preconditioners are flexible enough that they can betailored to specific applications.
Conversely, there has been a trend in recent years to build algebraicpreconditioners that mimic the properties of specialized preconditioners;for instance, algebraic multilevel methods.
11
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
12
Implicit vs. explicit preconditioners
An implicit, or direct, preconditioner is an approximation of the inputmatrix: M ≈ A.
13
Implicit vs. explicit preconditioners
An implicit, or direct, preconditioner is an approximation of the inputmatrix: M ≈ A.
An explicit, or inverse, preconditioner is an approximation of the inverse ofthe input matrix: G = M−1 ≈ A−1. This is motivated by the observationthat even though A−1 is a dense matrix, many of its entries are negligiblysmall.
13
Implicit vs. explicit preconditioners
An implicit, or direct, preconditioner is an approximation of the inputmatrix: M ≈ A.
An explicit, or inverse, preconditioner is an approximation of the inverse ofthe input matrix: G = M−1 ≈ A−1. This is motivated by the observationthat even though A−1 is a dense matrix, many of its entries are negligiblysmall.
Examples of implicit preconditioners include classical splittings, incompletefactorizations, block and multilevel variants.
13
Implicit vs. explicit preconditioners
An implicit, or direct, preconditioner is an approximation of the inputmatrix: M ≈ A.
An explicit, or inverse, preconditioner is an approximation of the inverse ofthe input matrix: G = M−1 ≈ A−1. This is motivated by the observationthat even though A−1 is a dense matrix, many of its entries are negligiblysmall.
Examples of implicit preconditioners include classical splittings, incompletefactorizations, block and multilevel variants.
Examples of explicit preconditioners include polynomial preconditioners,sparse approximate inverses, and data-sparse approximate inverses.
13
Implicit vs. explicit preconditioners
An implicit, or direct, preconditioner is an approximation of the inputmatrix: M ≈ A.
An explicit, or inverse, preconditioner is an approximation of the inverse ofthe input matrix: G = M−1 ≈ A−1. This is motivated by the observationthat even though A−1 is a dense matrix, many of its entries are negligiblysmall.
Examples of implicit preconditioners include classical splittings, incompletefactorizations, block and multilevel variants.
Examples of explicit preconditioners include polynomial preconditioners,sparse approximate inverses, and data-sparse approximate inverses.
Both factored and non-factored forms are in use.
13
Implicit vs. explicit preconditioners
Application of an implicit preconditioner within a Krylov method (like CGor GMRES) requires solving one or more linear systems, often withtriangular or block triangular matrices.
14
Implicit vs. explicit preconditioners
Application of an implicit preconditioner within a Krylov method (like CGor GMRES) requires solving one or more linear systems, often withtriangular or block triangular matrices.
In contrast, application of an explicit preconditioner requires one or morematrix-vector products.
14
Implicit vs. explicit preconditioners
Application of an implicit preconditioner within a Krylov method (like CGor GMRES) requires solving one or more linear systems, often withtriangular or block triangular matrices.
In contrast, application of an explicit preconditioner requires one or morematrix-vector products.
Explicit preconditioners are easier to parallelize. Generally speaking,however, the construction of an explicit preconditioner tends to be morecostly than an implicit one. This is to be expected, since A (or its action)is known but A−1 is not.
14
Implicit vs. explicit preconditioners
Application of an implicit preconditioner within a Krylov method (like CGor GMRES) requires solving one or more linear systems, often withtriangular or block triangular matrices.
In contrast, application of an explicit preconditioner requires one or morematrix-vector products.
Explicit preconditioners are easier to parallelize. Generally speaking,however, the construction of an explicit preconditioner tends to be morecostly than an implicit one. This is to be expected, since A (or its action)is known but A−1 is not.
Also, convergence rates are usually better with implicit preconditionersthan with explicit ones. But there are exceptions!
14
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
15
Incomplete Factorization (IF) methods
When a sparse matrix is factored by Gaussian elimination, fill-in usuallytakes place. This means that the triangular factors L and U of thecoefficient matrix A are considerably less sparse than A.
16
Incomplete Factorization (IF) methods
When a sparse matrix is factored by Gaussian elimination, fill-in usuallytakes place. This means that the triangular factors L and U of thecoefficient matrix A are considerably less sparse than A.
Even though sparsity-preserving reordering techniques can be used toreduce fill-in, sparse direct methods are not considered viable for solvingvery large linear systems such as those arising from the discretization ofthree-dimensional boundary value problems, due to time and spaceconstraints.
16
Incomplete Factorization (IF) methods
When a sparse matrix is factored by Gaussian elimination, fill-in usuallytakes place. This means that the triangular factors L and U of thecoefficient matrix A are considerably less sparse than A.
Even though sparsity-preserving reordering techniques can be used toreduce fill-in, sparse direct methods are not considered viable for solvingvery large linear systems such as those arising from the discretization ofthree-dimensional boundary value problems, due to time and spaceconstraints.
However, by discarding part of the fill-in in the course of the factorizationprocess, simple but powerful preconditioners can be obtained in the formM = LU where L and U are the incomplete (approximate) LU factors.
16
Incomplete Factorization (IF) methods
Incomplete factorization algorithms differ in the rules that govern thedropping of fill-in in the incomplete factors. Fill-in can be discarded basedon several different criteria, such as position, value, or a combination ofthe two.
17
Incomplete Factorization (IF) methods
Incomplete factorization algorithms differ in the rules that govern thedropping of fill-in in the incomplete factors. Fill-in can be discarded basedon several different criteria, such as position, value, or a combination ofthe two.
Letting n = {1, 2, . . . , n}, one can fix a subset S ⊆ n× n of positions inthe matrix, usually including the main diagonal and all (i, j) such thataij 6= 0, and allow fill-in in the LU factors only in positions which are in S.
17
Incomplete Factorization (IF) methods
Incomplete factorization algorithms differ in the rules that govern thedropping of fill-in in the incomplete factors. Fill-in can be discarded basedon several different criteria, such as position, value, or a combination ofthe two.
Letting n = {1, 2, . . . , n}, one can fix a subset S ⊆ n× n of positions inthe matrix, usually including the main diagonal and all (i, j) such thataij 6= 0, and allow fill-in in the LU factors only in positions which are in S.
Formally, an incomplete factorization step can be described as
aij ←
{aij − aika
−1kk akj if (i, j) ∈ S
aij otherwise
for each k and for i, j > k.
17
Incomplete Factorizations (IF) methods
Very simple patterns for cheap / cache-efficient preconditioners?
18
Incomplete Factorizations (IF) methods
Very simple patterns for cheap / cache-efficient preconditioners?
Example: banded pattern: BCSSTK38, n = 8032, nnz = 181, 746; SPD(small structural analysis problem from Boeing).
bandwidth (full) PCG its
1 426
3 821
5 648
9 1638
15 792
1011 105
1311 56
1511 nc
3111 35
4111 18
18
Incomplete Factorization (IF) methods
Notice that the incomplete factorization may fail due to division by zeroor near-zero (this is usually referred to as a pivot breakdown), even if Aadmits an LU factorization without pivoting.
19
Incomplete Factorization (IF) methods
Notice that the incomplete factorization may fail due to division by zeroor near-zero (this is usually referred to as a pivot breakdown), even if Aadmits an LU factorization without pivoting.
Partial pivoting can help, but it is costly and does not always suffice inthe incomplete case.
19
Incomplete Factorization (IF) methods
Notice that the incomplete factorization may fail due to division by zeroor near-zero (this is usually referred to as a pivot breakdown), even if Aadmits an LU factorization without pivoting.
Partial pivoting can help, but it is costly and does not always suffice inthe incomplete case.
If S coincides with the set of positions which are nonzero in A, we obtainthe no-fill ILU factorization, or ILU(0). For SPD matrices the sameconcept applies to the Cholesky factorization A = LLT , resulting in theno-fill IC factorization, or IC(0).
19
Incomplete Factorization (IF) methods
Notice that the incomplete factorization may fail due to division by zeroor near-zero (this is usually referred to as a pivot breakdown), even if Aadmits an LU factorization without pivoting.
Partial pivoting can help, but it is costly and does not always suffice inthe incomplete case.
If S coincides with the set of positions which are nonzero in A, we obtainthe no-fill ILU factorization, or ILU(0). For SPD matrices the sameconcept applies to the Cholesky factorization A = LLT , resulting in theno-fill IC factorization, or IC(0).
When used with the conjugate gradient algorithm, this preconditionerleads to the ICCG method (Meijerink & van der Vorst, 1977).
19
Incomplete Factorization (IF) methods
The no-fill ILU and IC preconditioners are very simple to implement, theircomputation is inexpensive, and they are reasonably effective for significantproblems, such as low-order discretizations of scalar elliptic PDEs leadingto M -matrices or to diagonally dominant ones. No pivot breakdown canoccur in these cases (Meijerink & van der Vorst, 1977; Manteuffel, 1980).
20
Incomplete Factorization (IF) methods
The no-fill ILU and IC preconditioners are very simple to implement, theircomputation is inexpensive, and they are reasonably effective for significantproblems, such as low-order discretizations of scalar elliptic PDEs leadingto M -matrices or to diagonally dominant ones. No pivot breakdown canoccur in these cases (Meijerink & van der Vorst, 1977; Manteuffel, 1980).
However, for more difficult and realistic problems the no-fill factorizationsresult in too crude an approximation of A, and more sophisticatedpreconditioners, which allow some fill-in in the incomplete factors, areneeded. For instance, this is the case for highly nonsymmetric andindefinite matrices such as those arising in many CFD applications.
20
Incomplete Factorization (IF) methods
A hierarchy of ILU preconditioners may be obtained based on the “levels offill-in” concept. A level of fill is attributed to each matrix entry thatoccurs in the incomplete factorization process. Fill-ins are dropped basedon the value of the level of fill. The formal definition is as follows.
21
Incomplete Factorization (IF) methods
A hierarchy of ILU preconditioners may be obtained based on the “levels offill-in” concept. A level of fill is attributed to each matrix entry thatoccurs in the incomplete factorization process. Fill-ins are dropped basedon the value of the level of fill. The formal definition is as follows.
The initial level of fill of a matrix entry aij is defined to be
levij =
{0, if aij 6= 0, or i = j
∞ otherwise.
Each time this element is modified by the ILU process, its level of fill mustbe updated according to
levij = min{levij , levik + levkj + 1}.
21
Incomplete Factorization (IF) methods
A hierarchy of ILU preconditioners may be obtained based on the “levels offill-in” concept. A level of fill is attributed to each matrix entry thatoccurs in the incomplete factorization process. Fill-ins are dropped basedon the value of the level of fill. The formal definition is as follows.
The initial level of fill of a matrix entry aij is defined to be
levij =
{0, if aij 6= 0, or i = j
∞ otherwise.
Each time this element is modified by the ILU process, its level of fill mustbe updated according to
levij = min{levij , levik + levkj + 1}.
Let ℓ be a nonnegative integer. With ILU(ℓ), all fill-ins whose level isgreater than ℓ are dropped. Note that for ℓ = 0, we recover the no-fillILU(0) preconditioner.
21
Example
Level-based incomplete LU factorizations ILU(ℓ)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 217
ILU(0)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 289
ILU(0) ILU(1)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 349
ILU(0) ILU(2)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 457
ILU(0) ILU(3)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 541
ILU(0) ILU(4)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 601
ILU(0) ILU(5)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 637
ILU(0) ILU(6)
22
Example
Level-based incomplete LU factorizations ILU(ℓ)
Motivated by decay in factors of diagonally dominant matrices
Structure of incomplete factors can be predicted using matrix graph
0 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 2170 10 20 30 40 50
0
5
10
15
20
25
30
35
40
45
50
nz = 649
ILU(0) ILU(7)
22
Numerical Example
Fast symbolic costruction (Hysom & Pothen, SISC 2001)
23
Numerical Example
Fast symbolic costruction (Hysom & Pothen, SISC 2001)
But, typically expensive to apply even for modest number of levels
23
Numerical Example
Fast symbolic costruction (Hysom & Pothen, SISC 2001)
But, typically expensive to apply even for modest number of levels
Example: Matrix ENGINE, n = 143, 571, nnz = 2, 424, 822; SPD.
levels size prec PCG its.
0 2,424,822 523
1 4,458,588 300
2 7,595,466 199
3 12,128,289 115
4 18,078,603 87
5 25,474,380 54
6 34,153,746 45
7 43,861,328 46
8 54,276,063 36
23
Preprocessing incomplete factorizations
Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:
24
Preprocessing incomplete factorizations
Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:
Symmetric reorderings (RCM, MD, ND, etc.)
24
Preprocessing incomplete factorizations
Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:
Symmetric reorderings (RCM, MD, ND, etc.)
“Static pivoting”: nonsymmetric permutations and scalings aimed atincreasing diagonal dominance (Duff & Koster, SIMAX 1999, 2001;B., Haws & T �uma, SISC 2000; Saad, SISC 2005; Mayer, SISC 2008)
24
Preprocessing incomplete factorizations
Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:
Symmetric reorderings (RCM, MD, ND, etc.)
“Static pivoting”: nonsymmetric permutations and scalings aimed atincreasing diagonal dominance (Duff & Koster, SIMAX 1999, 2001;B., Haws & T �uma, SISC 2000; Saad, SISC 2005; Mayer, SISC 2008)
Extension to symmetric indefinite problems (Duff & Pralet, SIMAX2005; Hagemann & Schenk, SISC 2006)
24
Preprocessing incomplete factorizations
Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:
Symmetric reorderings (RCM, MD, ND, etc.)
“Static pivoting”: nonsymmetric permutations and scalings aimed atincreasing diagonal dominance (Duff & Koster, SIMAX 1999, 2001;B., Haws & T �uma, SISC 2000; Saad, SISC 2005; Mayer, SISC 2008)
Extension to symmetric indefinite problems (Duff & Pralet, SIMAX2005; Hagemann & Schenk, SISC 2006)
Block variants (many authors)
24
Preprocessing incomplete factorizations
Preprocessing originally designed for direct solvers often very useful toimprove robustness of ILU preconditioners:
Symmetric reorderings (RCM, MD, ND, etc.)
“Static pivoting”: nonsymmetric permutations and scalings aimed atincreasing diagonal dominance (Duff & Koster, SIMAX 1999, 2001;B., Haws & T �uma, SISC 2000; Saad, SISC 2005; Mayer, SISC 2008)
Extension to symmetric indefinite problems (Duff & Pralet, SIMAX2005; Hagemann & Schenk, SISC 2006)
Block variants (many authors)
But, for very tough problems still not enough to guarantee convergence ofpreconditioned iteration.
24
Example (cont.)
Preprocessing: matrix is reordered with Multiple Minimum Degree, afill-reducing ordering.
25
Example (cont.)
Preprocessing: matrix is reordered with Multiple Minimum Degree, afill-reducing ordering.Matrix ENGINE, n = 143, 571, nnz = 2, 424, 822, MMD ordering
25
Example (cont.)
Preprocessing: matrix is reordered with Multiple Minimum Degree, afill-reducing ordering.Matrix ENGINE, n = 143, 571, nnz = 2, 424, 822, MMD ordering
levels size its size its
0 2,424,822 523 2,424,822 439
1 4,458,588 300 4,394,040 214
2 7,595,466 199 6,509,826 159
3 12,128,289 115 8,859,522 96
4 18,078,603 87 11,292,927 66
5 25,474,380 54 13,664,157 49
6 34,153,746 45 15,891,321 34
7 43,861,328 46 – nc
8 54,276,063 36 19,590,303 18
25
Example (cont.)
Preprocessing: matrix is reordered with Multiple Minimum Degree, afill-reducing ordering.Matrix ENGINE, n = 143, 571, nnz = 2, 424, 822, MMD ordering
levels size its size its
0 2,424,822 523 2,424,822 439
1 4,458,588 300 4,394,040 214
2 7,595,466 199 6,509,826 159
3 12,128,289 115 8,859,522 96
4 18,078,603 87 11,292,927 66
5 25,474,380 54 13,664,157 49
6 34,153,746 45 15,891,321 34
7 43,861,328 46 – nc
8 54,276,063 36 19,590,303 18
Some improvement observed, but not entirely robust.
25
The use of drop tolerances
In many cases, an efficient preconditioner can be obtained from anincomplete factorization where new fill-ins are accepted or discarded onthe basis of their size. In this way, only fill-ins that contribute significantlyto the quality of the preconditioner are stored and used.
26
The use of drop tolerances
In many cases, an efficient preconditioner can be obtained from anincomplete factorization where new fill-ins are accepted or discarded onthe basis of their size. In this way, only fill-ins that contribute significantlyto the quality of the preconditioner are stored and used.
A drop tolerance is a positive number τ which is used in a droppingcriterion. An absolute dropping strategy can be used, whereby new fill-insare accepted only if greater than τ in absolute value. This criterion maywork poorly if the matrix is badly scaled, in which case it is better to use arelative drop tolerance.
26
The use of drop tolerances
In many cases, an efficient preconditioner can be obtained from anincomplete factorization where new fill-ins are accepted or discarded onthe basis of their size. In this way, only fill-ins that contribute significantlyto the quality of the preconditioner are stored and used.
A drop tolerance is a positive number τ which is used in a droppingcriterion. An absolute dropping strategy can be used, whereby new fill-insare accepted only if greater than τ in absolute value. This criterion maywork poorly if the matrix is badly scaled, in which case it is better to use arelative drop tolerance.
For example, when eliminating row i, a new fill-in is accepted only if it isgreater in absolute value than τ‖ai‖2, where ai denotes the ith row of A.Other criteria are also in use.
26
The use of drop tolerances
A drawback of this approach is that it is difficult to choose a good value ofthe drop tolerance: usually, this is done by trial-and-error for a few samplematrices from a given application, until a satisfactory value of τ is found.In many cases, good results are obtained for values of τ in the range10−4-10−2, but the optimal value is strongly problem-dependent.
27
The use of drop tolerances
A drawback of this approach is that it is difficult to choose a good value ofthe drop tolerance: usually, this is done by trial-and-error for a few samplematrices from a given application, until a satisfactory value of τ is found.In many cases, good results are obtained for values of τ in the range10−4-10−2, but the optimal value is strongly problem-dependent.
Another difficulty is that it is impossible to predict the amount of storagethat will be needed to store the incomplete LU factors. An efficient,predictable algorithm is obtained by limiting the number of nonzerosallowed in each row of the triangular factors. Saad (1994) has proposedthe following dual threshold strategy:
27
The use of drop tolerances
A drawback of this approach is that it is difficult to choose a good value ofthe drop tolerance: usually, this is done by trial-and-error for a few samplematrices from a given application, until a satisfactory value of τ is found.In many cases, good results are obtained for values of τ in the range10−4-10−2, but the optimal value is strongly problem-dependent.
Another difficulty is that it is impossible to predict the amount of storagethat will be needed to store the incomplete LU factors. An efficient,predictable algorithm is obtained by limiting the number of nonzerosallowed in each row of the triangular factors. Saad (1994) has proposedthe following dual threshold strategy:
Fix a drop tolerance τ and a number p of fill-ins to be allowed in each rowof the incomplete L/U factors; at each step of the elimination process,drop all fill-ins that are smaller than τ times the 2-norm of the current row;of all the remaining ones, keep (at most) the p largest ones in magnitude.
27
The use of drop tolerances
A variant of this approach allows in each row of the incomplete factors pnonzeros in addition to the positions that were already nonzeros in theoriginal matrix A. This makes sense for irregular problems in which thenonzeros in A are not distributed uniformly.
28
The use of drop tolerances
A variant of this approach allows in each row of the incomplete factors pnonzeros in addition to the positions that were already nonzeros in theoriginal matrix A. This makes sense for irregular problems in which thenonzeros in A are not distributed uniformly.
The resulting preconditioner, denoted by ILUT(τ, p), is quite powerful. Ifit fails on a problem for a given choice of the parameters τ and p, it willoften succeed by taking a smaller value of τ and/or a larger value of p.The corresponding incomplete Cholesky preconditioner for SPD matrices,denoted ICT, can also be defined.
28
The use of drop tolerances
A variant of this approach allows in each row of the incomplete factors pnonzeros in addition to the positions that were already nonzeros in theoriginal matrix A. This makes sense for irregular problems in which thenonzeros in A are not distributed uniformly.
The resulting preconditioner, denoted by ILUT(τ, p), is quite powerful. Ifit fails on a problem for a given choice of the parameters τ and p, it willoften succeed by taking a smaller value of τ and/or a larger value of p.The corresponding incomplete Cholesky preconditioner for SPD matrices,denoted ICT, can also be defined.
ILUT(τ, p) and the variant with partial pivoting ILUTP(τ, p) are quiteeffective and widely used in many industrial applications. However, failurescan still occur.
28
Example
IC(0)/ICT may fail and simple diagonal scaling work!
29
Example
IC(0)/ICT may fail and simple diagonal scaling work!
Matrix LDOOR (structural analysis of car door), n = 952, 203,nnz = 23, 737, 339.
29
Example
IC(0)/ICT may fail and simple diagonal scaling work!
Matrix LDOOR (structural analysis of car door), n = 952, 203,nnz = 23, 737, 339.
precond / precond. size PCG its
Jacobi / 952,203 810
IC(0) / 23,737,339 > 1000
ICT / 23,838,704 > 1000
ICT / 24,614,381 > 1000
ICT / 26,167,321 > 1000
ICT / 30,047,027 > 1000
ICT / 37,809,756 > 1000
29
Stability considerations
ILU preconditioners attempt to make the residual matrix
R := A−M
small in some norm. However, this does not always result in goodpreconditioners.
30
Stability considerations
ILU preconditioners attempt to make the residual matrix
R := A−M
small in some norm. However, this does not always result in goodpreconditioners.
As observed by several authors (Elman, Saad, ...), a more meaningfulapproximation measure is based on the size of the error matrix
E := I −AM−1
30
Stability considerations
ILU preconditioners attempt to make the residual matrix
R := A−M
small in some norm. However, this does not always result in goodpreconditioners.
As observed by several authors (Elman, Saad, ...), a more meaningfulapproximation measure is based on the size of the error matrix
E := I −AM−1
Approximate inverse preconditioners attempt to make ‖E‖ small, but thismay require a huge number of nonzeros in the preconditioner (unless theentries of A−1 exhibit fast off-diagonal decay).
30
Stability considerations
ILU preconditioners attempt to make the residual matrix
R := A−M
small in some norm. However, this does not always result in goodpreconditioners.
As observed by several authors (Elman, Saad, ...), a more meaningfulapproximation measure is based on the size of the error matrix
E := I −AM−1
Approximate inverse preconditioners attempt to make ‖E‖ small, but thismay require a huge number of nonzeros in the preconditioner (unless theentries of A−1 exhibit fast off-diagonal decay).
Note that ‖E‖ = ‖RM−1‖ ≤ ‖R‖‖M−1‖. Hence, if M is veryill-conditioned (‖M−1‖ is very large), then a very large error matrix mayoccur even if ‖A−M‖ is small. This often results in failure to converge.
30
Stability considerations
Example (B., Szyld & van Duin, SISC 1999):
System Ax = b is a discretization of a convection-dominated,convection-diffusion equation. Solver: Bi-CGSTAB.Orderings: lexicographic and MMD.
31
Stability considerations
Example (B., Szyld & van Duin, SISC 1999):
System Ax = b is a discretization of a convection-dominated,convection-diffusion equation. Solver: Bi-CGSTAB.Orderings: lexicographic and MMD.
Let N1 := ‖A− LU‖F and N2 := ‖I −A(LU)−1‖F .
31
Stability considerations
Example (B., Szyld & van Duin, SISC 1999):
System Ax = b is a discretization of a convection-dominated,convection-diffusion equation. Solver: Bi-CGSTAB.Orderings: lexicographic and MMD.
Let N1 := ‖A− LU‖F and N2 := ‖I −A(LU)−1‖F .
ILU(0) Lexicogr. MMD
N1 4.06 · 10−1 4.53 · 100
N2 3.26 · 106 2.00 · 102
Its nc 59
ILUT(0.01,5) Lexicogr. MMD
N1 1.78 · 10−1 7.39 · 101
N2 2.79 · 101 5.81 · 106
Its 11 nc
31
Permuting large entries of A to the main diagonal
0 500 1000 1500 2000 2500 3000 3500
0
500
1000
1500
2000
2500
3000
3500
nz = 254070 500 1000 1500 2000 2500 3000 3500
0
500
1000
1500
2000
2500
3000
3500
nz = 25407
Jacobian from Navier-Stokes equations (original and permuted with MC64+ RCM). After preprocessing, ILUT with Bi-CGSTAB converges in 24iterations. No convergence on original system.
32
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
33
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.
By now, a large body of literature exists (100’s of papers since the 1990s).
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.
By now, a large body of literature exists (100’s of papers since the 1990s).
Successfully used in numerous applications, including
solution of dense linear systems from BEM in electromagnetics,acoustics, and elastodynamics problems
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.
By now, a large body of literature exists (100’s of papers since the 1990s).
Successfully used in numerous applications, including
solution of dense linear systems from BEM in electromagnetics,acoustics, and elastodynamics problems
solution of sparse linear systems from photon and neutron transport,CFD, Markov chains, eigenproblems, etc
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.
By now, a large body of literature exists (100’s of papers since the 1990s).
Successfully used in numerous applications, including
solution of dense linear systems from BEM in electromagnetics,acoustics, and elastodynamics problems
solution of sparse linear systems from photon and neutron transport,CFD, Markov chains, eigenproblems, etc
quantum chemistry applications
34
Sparse approximate inverses
Idea: directly approximate the inverse with a sparse matrix G ≈ A−1, thenpreconditioner application only needs mat-vecs with G.
Mostly motivated by parallel processing; also, less prone to instabilitiesthan ILU, and easy to update when solving a sequence of linear systems.
Also useful for constructing robust smoothers for multigrid, and for otherpurposes like approximating Schur complements.
By now, a large body of literature exists (100’s of papers since the 1990s).
Successfully used in numerous applications, including
solution of dense linear systems from BEM in electromagnetics,acoustics, and elastodynamics problems
solution of sparse linear systems from photon and neutron transport,CFD, Markov chains, eigenproblems, etc
quantum chemistry applications
image processing (restoration, deblurring, inpainting)
34
Sparse approximate inverses
Main approaches: sparse approximate inverses (SAIs) can be factored orunfactored.
35
Sparse approximate inverses
Main approaches: sparse approximate inverses (SAIs) can be factored orunfactored.
Factored forms are of the type G = ZW where, for instance, Z ≈ U−1
and W ≈ L−1.
35
Sparse approximate inverses
Main approaches: sparse approximate inverses (SAIs) can be factored orunfactored.
Factored forms are of the type G = ZW where, for instance, Z ≈ U−1
and W ≈ L−1.
Factored forms are especially useful if A is SPD. In this case W = ZT andthe approximate inverse G = ZZT is guaranteed to be SPD. This allowsfor the use of the conjugate gradient (CG) method.
35
Sparse approximate inverses
Main approaches: sparse approximate inverses (SAIs) can be factored orunfactored.
Factored forms are of the type G = ZW where, for instance, Z ≈ U−1
and W ≈ L−1.
Factored forms are especially useful if A is SPD. In this case W = ZT andthe approximate inverse G = ZZT is guaranteed to be SPD. This allowsfor the use of the conjugate gradient (CG) method.
Another advantage is that factored forms contain more info for the samenumber of nonzeros than unfactored ones. However, application of thepreconditioner requires two mat-vecs (with Z and W ) rather than just one(with G).
35
Sparse approximate inverses
Main approaches: sparse approximate inverses (SAIs) can be factored orunfactored.
Factored forms are of the type G = ZW where, for instance, Z ≈ U−1
and W ≈ L−1.
Factored forms are especially useful if A is SPD. In this case W = ZT andthe approximate inverse G = ZZT is guaranteed to be SPD. This allowsfor the use of the conjugate gradient (CG) method.
Another advantage is that factored forms contain more info for the samenumber of nonzeros than unfactored ones. However, application of thepreconditioner requires two mat-vecs (with Z and W ) rather than just one(with G).
Sparse approximate inverses can be computed by different methods.
35
Sparse approximate inversesFrobenius norm minimization: SPAI
This class of approximate inverse techniques was the first to be proposedand investigated, back in the early 1970s (Benson et al.).
36
Sparse approximate inversesFrobenius norm minimization: SPAI
This class of approximate inverse techniques was the first to be proposedand investigated, back in the early 1970s (Benson et al.).
The basic idea is to compute a sparse matrix G ≈ A−1 as the solution ofthe following constrained minimization problem:
minG∈S‖I −AG‖F
where S is a set of matrices with a given sparsity pattern.
36
Sparse approximate inversesFrobenius norm minimization: SPAI
This class of approximate inverse techniques was the first to be proposedand investigated, back in the early 1970s (Benson et al.).
The basic idea is to compute a sparse matrix G ≈ A−1 as the solution ofthe following constrained minimization problem:
minG∈S‖I −AG‖F
where S is a set of matrices with a given sparsity pattern.
Since
‖I −AG‖2F =n∑
j=1
‖ej −Agj‖22,
where ej denotes the jth column of the identity matrix, the computationof G reduces to solving n independent linear least squares problems,subject to sparsity constraints.
36
Sparse approximate inversesFrobenius norm minimization: SPAI
This class of approximate inverse techniques was the first to be proposedand investigated, back in the early 1970s (Benson et al.).
The basic idea is to compute a sparse matrix G ≈ A−1 as the solution ofthe following constrained minimization problem:
minG∈S‖I −AG‖F
where S is a set of matrices with a given sparsity pattern.
Since
‖I −AG‖2F =n∑
j=1
‖ej −Agj‖22,
where ej denotes the jth column of the identity matrix, the computationof G reduces to solving n independent linear least squares problems,subject to sparsity constraints.
These (small) LS problems can be solved efficiently by QR factorization.36
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
The main issue is the choice of the sparsity pattern S for G.
37
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
The main issue is the choice of the sparsity pattern S for G.
Two options: fixed (static), or adaptive (dynamic).
37
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
The main issue is the choice of the sparsity pattern S for G.
Two options: fixed (static), or adaptive (dynamic).
Static sparsity patterns are usually based on the pattern of A or of somepower Ak of A, with k small. This is motivated by the Neumann seriesexpansion of A−1.
37
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
The main issue is the choice of the sparsity pattern S for G.
Two options: fixed (static), or adaptive (dynamic).
Static sparsity patterns are usually based on the pattern of A or of somepower Ak of A, with k small. This is motivated by the Neumann seriesexpansion of A−1.
Small entries aij (with i 6= j) are usually removed from A prior todetermining the pattern of A2, A3, ...
37
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
The main issue is the choice of the sparsity pattern S for G.
Two options: fixed (static), or adaptive (dynamic).
Static sparsity patterns are usually based on the pattern of A or of somepower Ak of A, with k small. This is motivated by the Neumann seriesexpansion of A−1.
Small entries aij (with i 6= j) are usually removed from A prior todetermining the pattern of A2, A3, ...
Heuristics for dynamically determining the sparsity pattern have beenproposed by Cosgrove, Diaz & Griewank (IJCM 1992) and by Grote &Huckle (SISC 1997). Several user-defined parameters needed in input.
37
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.
38
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.
Available implementations include:
38
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.
Available implementations include:
ParaSAILS (Chow, 2000), based on fixed sparsity patterns. Availablein the hypre software package, seehttps://computation.llnl.gov/casc/hypre/software.html
38
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.
Available implementations include:
ParaSAILS (Chow, 2000), based on fixed sparsity patterns. Availablein the hypre software package, seehttps://computation.llnl.gov/casc/hypre/software.html
SPAI (Grote & Huckle, 1997), based on dynamic sparsity patterns; see
http://www.computational.unibas.ch/software/spai
38
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.
Available implementations include:
ParaSAILS (Chow, 2000), based on fixed sparsity patterns. Availablein the hypre software package, seehttps://computation.llnl.gov/casc/hypre/software.html
SPAI (Grote & Huckle, 1997), based on dynamic sparsity patterns; see
http://www.computational.unibas.ch/software/spai
MSPAI (Modified SPAI: Huckle at al., 2008), seehttp://www5.in.tum.de/wiki/index.php/MSPAI
38
Sparse approximate inversesFrobenius norm minimization: SPAI (cont.)
There is a trade-off: dynamic sparsity patterns give better preconditioners,but they are expensive and harder to parallelize.
Available implementations include:
ParaSAILS (Chow, 2000), based on fixed sparsity patterns. Availablein the hypre software package, seehttps://computation.llnl.gov/casc/hypre/software.html
SPAI (Grote & Huckle, 1997), based on dynamic sparsity patterns; see
http://www.computational.unibas.ch/software/spai
MSPAI (Modified SPAI: Huckle at al., 2008), seehttp://www5.in.tum.de/wiki/index.php/MSPAI
NOTE: All these implementations are MPI-based.
38
Sparse approximate inversesFactorized forms: FSAI, AINV
Factorized sparse approximate inverses approximate the inverse Choleskyor LU factors directly from A.
39
Sparse approximate inversesFactorized forms: FSAI, AINV
Factorized sparse approximate inverses approximate the inverse Choleskyor LU factors directly from A.
Two main approaches: Frobenius norm minimization (FSAI) andbi-conjugation (AINV).
39
Sparse approximate inversesFactorized forms: FSAI, AINV
Factorized sparse approximate inverses approximate the inverse Choleskyor LU factors directly from A.
Two main approaches: Frobenius norm minimization (FSAI) andbi-conjugation (AINV).
FSAI (Kolotilina & Yeremin, SIMAX 1994): compute Z by minimizing‖I − LT Z‖F over all triangular matrices Z with a given sparsity pattern.Remarkably, this can be done without knowing the Cholesky factor L.Inherently parallel construction.
39
Sparse approximate inversesFactorized forms: FSAI, AINV
Factorized sparse approximate inverses approximate the inverse Choleskyor LU factors directly from A.
Two main approaches: Frobenius norm minimization (FSAI) andbi-conjugation (AINV).
FSAI (Kolotilina & Yeremin, SIMAX 1994): compute Z by minimizing‖I − LT Z‖F over all triangular matrices Z with a given sparsity pattern.Remarkably, this can be done without knowing the Cholesky factor L.Inherently parallel construction.
AINV (B., Meyer & Tuma, SISC 1996; B. & Tuma, SISC 1998): computeZ and W by A-biconjugation of the standard basis vectors e1, e2, . . . , en.
39
Sparse approximate inversesFactorized forms: FSAI, AINV
Factorized sparse approximate inverses approximate the inverse Choleskyor LU factors directly from A.
Two main approaches: Frobenius norm minimization (FSAI) andbi-conjugation (AINV).
FSAI (Kolotilina & Yeremin, SIMAX 1994): compute Z by minimizing‖I − LT Z‖F over all triangular matrices Z with a given sparsity pattern.Remarkably, this can be done without knowing the Cholesky factor L.Inherently parallel construction.
AINV (B., Meyer & Tuma, SISC 1996; B. & Tuma, SISC 1998): computeZ and W by A-biconjugation of the standard basis vectors e1, e2, . . . , en.
If A is SPD, this is just Gram-Schmidt orthogonalization wrt inner product〈x, y〉A := xT Ay. Sparsity is preserved by dropping small entries in Z, W .
39
Sparse approximate inversesFactorized forms: FSAI, AINV (cont.)
If A is not SPD, the bilinear form 〈x, y〉A := xT Ay does not define aninner product and breakdowns can occur, due to division by zero (sincexT Ax = 0 can happen even if x 6= 0). Moreover, due do dropping of smallentries the incomplete process could break down even if the complete onedoes not.
40
Sparse approximate inversesFactorized forms: FSAI, AINV (cont.)
If A is not SPD, the bilinear form 〈x, y〉A := xT Ay does not define aninner product and breakdowns can occur, due to division by zero (sincexT Ax = 0 can happen even if x 6= 0). Moreover, due do dropping of smallentries the incomplete process could break down even if the complete onedoes not.
However, we have proved that the incomplete process does not break downif A is an M -matrix or a diagonally dominant matrix (more generally, anH-matrix: B., Meyer & Tuma, SISC 1996).
40
Sparse approximate inversesFactorized forms: FSAI, AINV (cont.)
If A is not SPD, the bilinear form 〈x, y〉A := xT Ay does not define aninner product and breakdowns can occur, due to division by zero (sincexT Ax = 0 can happen even if x 6= 0). Moreover, due do dropping of smallentries the incomplete process could break down even if the complete onedoes not.
However, we have proved that the incomplete process does not break downif A is an M -matrix or a diagonally dominant matrix (more generally, anH-matrix: B., Meyer & Tuma, SISC 1996).
Furthermore, there exist a stabilized variant of AINV (SAINV: B., Cullum& Tuma, SISC 2000) that does not break down if A is positive definite.
40
Sparse approximate inversesFactorized forms: FSAI, AINV (cont.)
If A is not SPD, the bilinear form 〈x, y〉A := xT Ay does not define aninner product and breakdowns can occur, due to division by zero (sincexT Ax = 0 can happen even if x 6= 0). Moreover, due do dropping of smallentries the incomplete process could break down even if the complete onedoes not.
However, we have proved that the incomplete process does not break downif A is an M -matrix or a diagonally dominant matrix (more generally, anH-matrix: B., Meyer & Tuma, SISC 1996).
Furthermore, there exist a stabilized variant of AINV (SAINV: B., Cullum& Tuma, SISC 2000) that does not break down if A is positive definite.
In practice, the robustness of (S)AINV is essentially the same as for ILUTpreconditioning.
40
Sparse approximate inversesFactorized forms: Parallel Block AINV
Construction phase in AINV is sequential, but can be parallelized usinggraph partitioning and the fact that the inverse factors of
41
Sparse approximate inversesFactorized forms: Parallel Block AINV
Construction phase in AINV is sequential, but can be parallelized usinggraph partitioning and the fact that the inverse factors of
A =
A1 B1
A2 B2
. . ....
Ap Bp
C1 C2 . . . Cp AS
have the same block stucture as the lower and upper block triangular partsof A, allowing for considerable parallelism in the construction of thepreconditioner (B., Marín & Tuma, 1999).
41
Sparse approximate inversesExample
200 400 600 800 1000 1200 1400
200
400
600
800
1000
1200
1400
200 400 600 800 1000 1200 1400
200
400
600
800
1000
1200
1400
Coefficient matrix and sparse approximate inverse: FEM for fluid-structureinteraction problem; nnz(Z + W )/nnz(A) ≈ 1.56. PreconditionedBi-CGSTAB converges in 39 iterations.
42
Sparse approximate inversesSample parallel AINV-PCG results
Table: 2D neutron diffusion problem; FEM, n = 804, 609.
p 2 4 8 16 32 64
Prec-Time 11.6 5.89 3.24 1.79 1.12 0.94It-Time 227.1 113.3 56.9 26.7 13.6 11.7PCG Its 157 157 156 157 156 157
43
Sparse approximate inversesSample parallel AINV-PCG results
Table: 2D neutron diffusion problem; FEM, n = 804, 609.
p 2 4 8 16 32 64
Prec-Time 11.6 5.89 3.24 1.79 1.12 0.94It-Time 227.1 113.3 56.9 26.7 13.6 11.7PCG Its 157 157 156 157 156 157
Table: Barotropic equation; FDM, n = 370, 000.
p 2 4 8 16 32
Prec-Time 13.4 7.0 3.72 1.91 1.15It-Time 119.2 59.3 25.9 11.9 6.8PCG Its 189 189 189 202 187
43
Sparse approximate inversesSample parallel AINV-PCG results
Table: 2D neutron diffusion problem; FEM, n = 804, 609.
p 2 4 8 16 32 64
Prec-Time 11.6 5.89 3.24 1.79 1.12 0.94It-Time 227.1 113.3 56.9 26.7 13.6 11.7PCG Its 157 157 156 157 156 157
Table: Barotropic equation; FDM, n = 370, 000.
p 2 4 8 16 32
Prec-Time 13.4 7.0 3.72 1.91 1.15It-Time 119.2 59.3 25.9 11.9 6.8PCG Its 189 189 189 202 187
Note: Computations done 12 yrs ago on an SGI Origin 2000 (LANL).43
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
Sparse approximate inverses share some of the limitations of IF methods:
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
Sparse approximate inverses share some of the limitations of IF methods:
factored form may suffer breakdowns, esp. if A is highly indefinite
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
Sparse approximate inverses share some of the limitations of IF methods:
factored form may suffer breakdowns, esp. if A is highly indefinite
convergence rate may be unsatisfactory for sparse G
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
Sparse approximate inverses share some of the limitations of IF methods:
factored form may suffer breakdowns, esp. if A is highly indefinite
convergence rate may be unsatisfactory for sparse G
in general, performance is unpredictable and failures may occur
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
Sparse approximate inverses share some of the limitations of IF methods:
factored form may suffer breakdowns, esp. if A is highly indefinite
convergence rate may be unsatisfactory for sparse G
in general, performance is unpredictable and failures may occur
like IF, lack of scalability for increasing problem size
44
Sparse approximate inverses
Implementations of AINV and the stabilized variant SAINV are available at
http://www2.cs.cas.cz/∼tuma/sparslab.html
andhttp://www.dmsa.unipd.it/∼sartoret/Pdacg/pdacg.htm
and also in the CEA library SLOOP (Meurant, 2006).
Sparse approximate inverses share some of the limitations of IF methods:
factored form may suffer breakdowns, esp. if A is highly indefinite
convergence rate may be unsatisfactory for sparse G
in general, performance is unpredictable and failures may occur
like IF, lack of scalability for increasing problem size
Several authors have addressed these issues in the last few years.
44
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
Multiresolution AINV (Bridson & Tang, SISC 2002)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
Multiresolution AINV (Bridson & Tang, SISC 2002)
SPAI as smoothers for (A)MG (Bröker and Grote, APNUM 2002)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
Multiresolution AINV (Bridson & Tang, SISC 2002)
SPAI as smoothers for (A)MG (Bröker and Grote, APNUM 2002)
Spectral preconditioners based on SPAI (Carpentieri, Duff, Giraud etal. 2005)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
Multiresolution AINV (Bridson & Tang, SISC 2002)
SPAI as smoothers for (A)MG (Bröker and Grote, APNUM 2002)
Spectral preconditioners based on SPAI (Carpentieri, Duff, Giraud etal. 2005)
Data-sparse approximate inverses (Bebendorf, SIMAX 2006)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
Multiresolution AINV (Bridson & Tang, SISC 2002)
SPAI as smoothers for (A)MG (Bröker and Grote, APNUM 2002)
Spectral preconditioners based on SPAI (Carpentieri, Duff, Giraud etal. 2005)
Data-sparse approximate inverses (Bebendorf, SIMAX 2006)
Multilevel SPAI for AMR (Wang and de Sturler, LAA 2009)
45
Sparse approximate inverses
The lack of scalability for increasing problem size has motivated thedevelopment of multilevel methods based on SAIs, including:
Wavelet-based SPAI (Chan, Wang & Tang, BIT 1997)
Multilevel SPAI (Bollhöfer & Mehrmann, SIMAX 2002)
MLAINV (Meurant, Numer. Alg. 2002)
Multiresolution AINV (Bridson & Tang, SISC 2002)
SPAI as smoothers for (A)MG (Bröker and Grote, APNUM 2002)
Spectral preconditioners based on SPAI (Carpentieri, Duff, Giraud etal. 2005)
Data-sparse approximate inverses (Bebendorf, SIMAX 2006)
Multilevel SPAI for AMR (Wang and de Sturler, LAA 2009)
Although not always h-independent, these preconditioners exhibit muchbetter scalability than one-level SAIs.
45
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
46
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
Consider the triangular decomposition A−1 ≈ L−T D−1L−1
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
Consider the triangular decomposition A−1 ≈ L−T D−1L−1
Notation: L, L : (lij), ZT ≡ L−1, L−1 : (ℓij) ≡ (ℓj)
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
Consider the triangular decomposition A−1 ≈ L−T D−1L−1
Notation: L, L : (lij), ZT ≡ L−1, L−1 : (ℓij) ≡ (ℓj)
Compare with the (exact) LDLT decomposition of A:
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
Consider the triangular decomposition A−1 ≈ L−T D−1L−1
Notation: L, L : (lij), ZT ≡ L−1, L−1 : (ℓij) ≡ (ℓj)
Compare with the (exact) LDLT decomposition of A:Factor L of A = LDLT is L = AL−TD−1
It can be easily retrieved from this inverse factorization
⇓
AZ = AL−T = LD, lower triangular
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
Consider the triangular decomposition A−1 ≈ L−T D−1L−1
Notation: L, L : (lij), ZT ≡ L−1, L−1 : (ℓij) ≡ (ℓj)
Compare with the (exact) LDLT decomposition of A:Factor L of A = LDLT is L = AL−TD−1
It can be easily retrieved from this inverse factorization
⇓
AZ = AL−T = LD, lower triangular
〈ek, Aℓj〉
dk
= lkj for k ≥ j
47
IF via approximate inversesRIF motivation
RIF (Robust incomplete factorization; B. & Tuma, NLAA 2003)
Based on factored approximate inverse SAINV
Consider the triangular decomposition A−1 ≈ L−T D−1L−1
Notation: L, L : (lij), ZT ≡ L−1, L−1 : (ℓij) ≡ (ℓj)
Compare with the (exact) LDLT decomposition of A:Factor L of A = LDLT is L = AL−TD−1
It can be easily retrieved from this inverse factorization
⇓
AZ = AL−T = LD, lower triangular
〈ek, Aℓj〉
dk
= lkj for k ≥ j
Using ZT = L−1 we can get L (from L−1 get L) at no extra cost
47
IF via approximate inversesRIF implementation
Note: lkj =〈ek, Aℓj〉
dk
≡〈ℓk, Aℓj〉
dk
for k ≥ j
48
IF via approximate inversesRIF implementation
Note: lkj =〈ek, Aℓj〉
dk
≡〈ℓk, Aℓj〉
dk
for k ≥ j
⇓
48
IF via approximate inversesRIF implementation
Note: lkj =〈ek, Aℓj〉
dk
≡〈ℓk, Aℓj〉
dk
for k ≥ j
⇓
The latter equivalence provides a breakdown-free implementation,since dk = 〈ℓk, Aℓk〉 > 0 for A SPD (B. & Tuma, NLAA 2003)
48
IF via approximate inversesRIF implementation
Note: lkj =〈ek, Aℓj〉
dk
≡〈ℓk, Aℓj〉
dk
for k ≥ j
⇓
The latter equivalence provides a breakdown-free implementation,since dk = 〈ℓk, Aℓk〉 > 0 for A SPD (B. & Tuma, NLAA 2003)Experimentally, often more space-efficient for the same iterationcounts
48
IF via approximate inversesRIF implementation
Note: lkj =〈ek, Aℓj〉
dk
≡〈ℓk, Aℓj〉
dk
for k ≥ j
⇓
The latter equivalence provides a breakdown-free implementation,since dk = 〈ℓk, Aℓk〉 > 0 for A SPD (B. & Tuma, NLAA 2003)Experimentally, often more space-efficient for the same iterationcounts
L
donenot used
doneused
inv(L)
48
IF via approximate inversesRIF implementation
Note: lkj =〈ek, Aℓj〉
dk
≡〈ℓk, Aℓj〉
dk
for k ≥ j
⇓
The latter equivalence provides a breakdown-free implementation,since dk = 〈ℓk, Aℓk〉 > 0 for A SPD (B. & Tuma, NLAA 2003)Experimentally, often more space-efficient for the same iterationcounts
L
donenot used
doneused
inv(L)
One-way transfer of information48
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
49
IF with approximate inverses(I − A
−1)−1 biconjugation
Consider
A = I +
n∑
k=1
ek(ak − ek)T
50
IF with approximate inverses(I − A
−1)−1 biconjugation
Consider
A = I +
n∑
k=1
ek(ak − ek)T
Apply n Sherman-Morrison updates to get A−1
(Bru, Cerdán, Marín, Mas, SISC 2003)
50
IF with approximate inverses(I − A
−1)−1 biconjugation
Consider
A = I +
n∑
k=1
ek(ak − ek)T
Apply n Sherman-Morrison updates to get A−1
(Bru, Cerdán, Marín, Mas, SISC 2003)
The process for R = (rk), V = (vk), D = diag(d1, . . . , dn) fork = 1, 2, . . . , n:
rk = ek −
k−1∑
i=1
vTi ek
sriri , vk = (ak − ek)k −
k−1∑
i=1
(ak − ek)Tk ri
srivi,
dk = 1 + (ak − ek)Tk rk = 1 + vT
k ek
50
IF with approximate inverses(I − A
−1)−1 biconjugation
Consider
A = I +
n∑
k=1
ek(ak − ek)T
Apply n Sherman-Morrison updates to get A−1
(Bru, Cerdán, Marín, Mas, SISC 2003)
The process for R = (rk), V = (vk), D = diag(d1, . . . , dn) fork = 1, 2, . . . , n:
rk = ek −
k−1∑
i=1
vTi ek
sriri , vk = (ak − ek)k −
k−1∑
i=1
(ak − ek)Tk ri
srivi,
dk = 1 + (ak − ek)Tk rk = 1 + vT
k ek
I −A−1 = RD−1V T , R unit upper triangular
50
IF with approximate inversesbalancing L and L
−1
Theorem
(Bru, Marín, Mas & Tuma, SISC 2008) For A SPD, let there exist thedecomposition A−1 = I −RDV T (1)
and let A = L∆LT be the LDLT decomposition of A. ThenV = L∆− L−T , R = L−1, ∆ = D.
51
IF with approximate inversesbalancing L and L
−1
Theorem
(Bru, Marín, Mas & Tuma, SISC 2008) For A SPD, let there exist thedecomposition A−1 = I −RDV T (1)
and let A = L∆LT be the LDLT decomposition of A. ThenV = L∆− L−T , R = L−1, ∆ = D.
Pictorially:
51
IF with approximate inversesbalancing L and L
−1
Theorem
(Bru, Marín, Mas & Tuma, SISC 2008) For A SPD, let there exist thedecomposition A−1 = I −RDV T (1)
and let A = L∆LT be the LDLT decomposition of A. ThenV = L∆− L−T , R = L−1, ∆ = D.
Pictorially:
V =
. . . −L−T
. . .
LD. . .
, diag(V ) = D − I. (2)
51
IF with approximate inversesbalancing L and L
−1
V =
. . . −L−T
. . .
LD. . .
, diag(V ) = D − I. (3)
52
IF with approximate inversesbalancing L and L
−1
V =
. . . −L−T
. . .
LD. . .
, diag(V ) = D − I. (3)
That is, we compute L and L−1 at the same time, by columns. Toget L, only V is needed
52
IF with approximate inversesbalancing L and L
−1
V =
. . . −L−T
. . .
LD. . .
, diag(V ) = D − I. (3)
That is, we compute L and L−1 at the same time, by columns. Toget L, only V is needed
Can be extended to nonsymmetric matrices (Bru, Marín, Mas &Tuma, SIMAX 2010)
52
IF with approximate inversesbalancing L and L
−1
V =
. . . −L−T
. . .
LD. . .
, diag(V ) = D − I. (3)
That is, we compute L and L−1 at the same time, by columns. Toget L, only V is needed
Can be extended to nonsymmetric matrices (Bru, Marín, Mas &Tuma, SIMAX 2010)
Sparse case used for preconditioning: The factors L and L−1
influence (balance) each other during the computation and can beconnected via dropping (Bru, Marín, Mas & Tuma, SISC 2008)
52
IF with approximate inversesbalancing L and L
−1
V =
. . . −L−T
. . .
LD. . .
, diag(V ) = D − I. (3)
That is, we compute L and L−1 at the same time, by columns. Toget L, only V is needed
Can be extended to nonsymmetric matrices (Bru, Marín, Mas &Tuma, SIMAX 2010)
Sparse case used for preconditioning: The factors L and L−1
influence (balance) each other during the computation and can beconnected via dropping (Bru, Marín, Mas & Tuma, SISC 2008)
Note that this preconditioner is based on a novel matrix factorization
52
IF with approximate inversesBIF experiments
Example: matrix PWTK, n=217,918, nnz=5,926,171
53
IF with approximate inversesBIF experiments
Example: matrix PWTK, n=217,918, nnz=5,926,171
0 1 2 3 4 5 6
x 106
0
5
10
15
20
25tim
e to
com
pute
the
prec
ondi
tione
r (in
sec
onds
)
size of the preconditioner (in the number of nonzeros)
RIF BIF
53
IF with approximate inversesBIF experiments (cont.)
0 1 2 3 4 5 6
x 106
0
5
10
15
20
25
30
35
40to
tal t
ime
(in s
econ
ds)
size of the preconditioner (in the number of nonzeros)
RIF BIF
54
IF with approximate inversesBIF pros and cons
Generally much faster and smoother preconditioner construction thanRIF, for similar or even better preconditioner quality.
55
IF with approximate inversesBIF pros and cons
Generally much faster and smoother preconditioner construction thanRIF, for similar or even better preconditioner quality.
Taking approximate inverses into account, dropping must be alwaysaggressive. Prefiltration of entries of A seems to be the standardstrategy.
55
IF with approximate inversesBIF pros and cons
Generally much faster and smoother preconditioner construction thanRIF, for similar or even better preconditioner quality.
Taking approximate inverses into account, dropping must be alwaysaggressive. Prefiltration of entries of A seems to be the standardstrategy.
BIF uses the inverse-based dropping rules based on Bollhöfer & Saad,2002. They need to be further investigated. They often seem toinfluence entries of the factors nonuniformly. Also, the dropping oftenforces skipping a lot of updates in the decomposition. Is this reallythe right way to go?
55
IF with approximate inversesOther recent work
Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).
56
IF with approximate inversesOther recent work
Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).
The resulting algorithm keeps the size of the error matrix I −A(LU)−1
bounded.
56
IF with approximate inversesOther recent work
Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).
The resulting algorithm keeps the size of the error matrix I −A(LU)−1
bounded.
Combined with preprocessing (nonsymmetric permutations/scalings,reorderings), this approach results in preconditioners that are much morerobust and effective than standard ILUs.
56
IF with approximate inversesOther recent work
Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).
The resulting algorithm keeps the size of the error matrix I −A(LU)−1
bounded.
Combined with preprocessing (nonsymmetric permutations/scalings,reorderings), this approach results in preconditioners that are much morerobust and effective than standard ILUs.
Inverse-based multilevel ILUs have been developed by Bollhöfer and Saadand implemented in ILUPACK, see
56
IF with approximate inversesOther recent work
Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).
The resulting algorithm keeps the size of the error matrix I −A(LU)−1
bounded.
Combined with preprocessing (nonsymmetric permutations/scalings,reorderings), this approach results in preconditioners that are much morerobust and effective than standard ILUs.
Inverse-based multilevel ILUs have been developed by Bollhöfer and Saadand implemented in ILUPACK, see
http://www-public.tu-bs.de/∼bolle/ilupack/
56
IF with approximate inversesOther recent work
Monitoring the growth of entries in the inverse factors can be used todevise new and improved dropping and diagonal pivoting strategies in ILU(Bollhöfer, LAA 2001; Bollhöfer & Saad, SIMAX 2002; Bollhöfer, SISC2003).
The resulting algorithm keeps the size of the error matrix I −A(LU)−1
bounded.
Combined with preprocessing (nonsymmetric permutations/scalings,reorderings), this approach results in preconditioners that are much morerobust and effective than standard ILUs.
Inverse-based multilevel ILUs have been developed by Bollhöfer and Saadand implemented in ILUPACK, see
http://www-public.tu-bs.de/∼bolle/ilupack/
ILUPACK can handle symmetric indefinite and complex symmetricmatrices. Parallel version is still under development.
56
IF with approximate inversesOther recent work (cont.)
Another multilevel ILU package that incorporates a number of recentimprovements is ILU++, developed by Jan Mayer (ACM TOMS, 2009):
57
IF with approximate inversesOther recent work (cont.)
Another multilevel ILU package that incorporates a number of recentimprovements is ILU++, developed by Jan Mayer (ACM TOMS, 2009):
http://www.iluplusplus.de/
57
IF with approximate inversesOther recent work (cont.)
Another multilevel ILU package that incorporates a number of recentimprovements is ILU++, developed by Jan Mayer (ACM TOMS, 2009):
http://www.iluplusplus.de/
Finally, we mention recent work by Raghavan & Teranishi (SISC 2010)combining parallel IC factorization with SAI. Here the IC factorizationA ≈ LLT is computed in parallel using a nested dissection ordering, andSAI is used to approximately invert the diagonal blocks in L.
57
IF with approximate inversesOther recent work (cont.)
Another multilevel ILU package that incorporates a number of recentimprovements is ILU++, developed by Jan Mayer (ACM TOMS, 2009):
http://www.iluplusplus.de/
Finally, we mention recent work by Raghavan & Teranishi (SISC 2010)combining parallel IC factorization with SAI. Here the IC factorizationA ≈ LLT is computed in parallel using a nested dissection ordering, andSAI is used to approximately invert the diagonal blocks in L.
The resulting hybrid algorithm, ICT-SSAI, achieves good convergencerates (close to those of IC) and scales very well on parallel architectures(like SAI). Parallel code available at
57
IF with approximate inversesOther recent work (cont.)
Another multilevel ILU package that incorporates a number of recentimprovements is ILU++, developed by Jan Mayer (ACM TOMS, 2009):
http://www.iluplusplus.de/
Finally, we mention recent work by Raghavan & Teranishi (SISC 2010)combining parallel IC factorization with SAI. Here the IC factorizationA ≈ LLT is computed in parallel using a nested dissection ordering, andSAI is used to approximately invert the diagonal blocks in L.
The resulting hybrid algorithm, ICT-SSAI, achieves good convergencerates (close to those of IC) and scales very well on parallel architectures(like SAI). Parallel code available at
http://www.cse.psu.edu/∼teranish/dscpack-ic.html
57
Outline
1 Introduction
2 Generalities about preconditioning
3 Basic concepts of algebraic preconditioning
4 Incomplete factorizations
5 Sparse approximate inverses
6 IF via approximate inverses
7 Balanced Incomplete Factorization (BIF)
8 Conclusions
58
Conclusions
Many advances in algebraic preconditioning in last few years
59
Conclusions
Many advances in algebraic preconditioning in last few years
‘Old’ methods, like ILUs, are continually being improved
59
Conclusions
Many advances in algebraic preconditioning in last few years
‘Old’ methods, like ILUs, are continually being improved
New methods are often hybrids, taking the best features of existingmethods
59
Conclusions
Many advances in algebraic preconditioning in last few years
‘Old’ methods, like ILUs, are continually being improved
New methods are often hybrids, taking the best features of existingmethods
Better robustness by borrowing techniques designed for direct solvers
59
Conclusions
Many advances in algebraic preconditioning in last few years
‘Old’ methods, like ILUs, are continually being improved
New methods are often hybrids, taking the best features of existingmethods
Better robustness by borrowing techniques designed for direct solvers
Better scalability by borrowing features of PDE solvers (multilevelschemes)
59
Conclusions
Many advances in algebraic preconditioning in last few years
‘Old’ methods, like ILUs, are continually being improved
New methods are often hybrids, taking the best features of existingmethods
Better robustness by borrowing techniques designed for direct solvers
Better scalability by borrowing features of PDE solvers (multilevelschemes)
Also: many excellent software packages available
59
Conclusions
Many advances in algebraic preconditioning in last few years
‘Old’ methods, like ILUs, are continually being improved
New methods are often hybrids, taking the best features of existingmethods
Better robustness by borrowing techniques designed for direct solvers
Better scalability by borrowing features of PDE solvers (multilevelschemes)
Also: many excellent software packages available
Many challenges remain. Highly indefinite problems?
59
References
M. Benzi, Preconditioning techniques for large linear systems: a survey,Journal of Computational Physics, 182 (2002), pp. 418–477.
K. Chen, Matrix Preconditioning Techniques and Applications, CambridgeUniversity Press, 2005.
G. Meurant, Computer Solution of Large Linear Systems, North-Holland,Elsevier, 1999.
Y. Saad, Iterative Methods for Sparse Linear Systems. Second Edition,SIAM, Philadelphia, 2003.
P. S. Vassilevski, Multilevel Block Factorization Preconditioners, Springer,2008.
60