Preconditioning Methods for Iterative Solvers
Kengo NakajimaInformation Technology Center
FEM3D-Part3 2
Preconditioning for Iterative Solvers Convergence rate of iterative solvers strongly depends
on the spectral properties (eigenvalue distribution) of the coefficient matrix A. Eigenvalue distribution is small, eigenvalues are close to 1 In “ill-conditioned” problems, “condition number” (ratio of
max/min eigenvalue if A is symmetric) is large(条件数). A preconditioner M (whose properties are similar to
those of A) transforms the linear system into one with more favorable spectral properties (前処理) M transforms Ax=b into A'x=b' where A'=M-1A, b'=M-1b If M~A, M-1A is close to identity matrix. If M-1=A-1, this is the best preconditioner (Gaussian Elim.) Generally, A'x’=b’ where A'=ML
-1AMR-1, b'=ML
-1b, x’=MRx ML/MR: Left/Right Preconditioning(左/右前処理)
FEM3D-Part3 3
Preconditioned CG Solver (PCG)Compute r(0)= b-[A]x(0)
for i= 1, 2, …solve [M]z(i-1)= r(i-1)
i-1= r(i-1) z(i-1)if i=1p(1)= z(0)
elsei-1= i-1/i-2p(i)= z(i-1) + i-1 p(i-1)
endifq(i)= [A]p(i)
i = i-1/p(i)q(i)x(i)= x(i-1) + ip(i)r(i)= r(i-1) - iq(i)check convergence |r|
end
[M]= [M1][M2]
[A’]x’=b’[A’]=[M1]-1[A][M2]-1
x’=[M2]x, b’=[M1]-1b
p’=>[M2]p, r’=>[M1]-1r
p’(i)= r’(i-1) + ’i-1 p’(i-1)
[M2]p(i)= [M1]-1r(i-1) + ’i-1 [M2]p(i-1)
p(i)= [M2]-1[M1]-1r(i-1) + ’i-1 p(i-1)p(i)= [M]-1r(i-1) + ’i-1 p(i-1)
’i-1= ([M]-1r(i-1),r(i-1))/([M]-1r(i-2),r(i-2))
’i-1= ([M]-1r(i-1),r(i-1))/( p(i-1),[A]p(i-1))
FEM3D-Part3 4
Preconditioned CG Solver (PCG)Solving the following equation:
rMz 1
AMAM ,11
Diagonal Scaling: Simple but weak DMDM ,11
Ultimate Preconditioning: Inverse Matrix AMAM ,11
“Approximate Inverse Matrix”(近似逆行列)
Compute r(0)= b-[A]x(0)
for i= 1, 2, …solve [M]z(i-1)= r(i-1)
i-1= r(i-1) z(i-1)if i=1p(1)= z(0)
elsei-1= i-1/i-2p(i)= z(i-1) + i-1 p(i-1)
endifq(i)= [A]p(i)
i = i-1/p(i)q(i)x(i)= x(i-1) + ip(i)r(i)= r(i-1) - iq(i)check convergence |r|
end
5
Diagonal Scaling, Point-Jacobi
N
N
DD
DD
M
0...00000.........00000...0
1
2
1
• solve [M]z(i-1)= r(i-1) is very easy.• Provides fast convergence for simple problems.• 1d.f, 1d.c
FEM3D-Part3
6
ILU(0), IC(0)• Widely used Preconditioners for Sparse Matrices
– Incomplete LU Factorization(不完全LU分解)
– Incomplete Cholesky Factorization (for Symmetric Matrices)(不完全コレスキー分解)
• Incomplete Direct Method– Even if original matrix is sparse, inverse matrix is not
necessarily sparse.– fill-in– ILU(0)/IC(0) without fill-in have same non-zero pattern
with the original (sparse) matrices
FEM3D-Part3
FEM3D-Part3 7
LU Factorization/Decomposition:Complete LU FactorizationLU分解・完全LU分解
A kind of direct method for solving linear eqn’s compute “inverse matrix” directly Information of “inverse matrix” can be saved,
therefore it’s efficient for multiple RHS cases “Fill-in” may occur during
factorization/decomposition entries which change from an initial zero to a non-zero
value during the execution of factorization/decomposition
LU factorization
FEM3D-Part3 8
Incomplete LU Factorization
ILU factorization Incomplete LU factorization
Preconditioning method using “incomplete” inverse matrices, where generation of “fill-in” is controlled Approximate/Incomplete Inverse Matrix, Weak
Direct method ILU(0): NO fill-in is allowed
FEM3D-Part3 9
Solving Linear Equations by LU Factorization
LU factorization of matrix A (n×n):
nn
n
n
n
nnnnnnnn
n
n
n
u
uuuuuuuuu
lll
lll
aaaa
aaaaaaaaaaaa
000
000
1
010010001
333
22322
1131211
321
3231
21
321
3333231
2232221
1131211
LUA L:Lower triangular part of matrix AU:Upper triangular part of matrix A
FEM3D-Part3 10
Matrix Form of Linear EquationGeneral Form of Linear Equation with “n” unknowns
nnnnnn
nn
nn
bxaxaxa
bxaxaxabxaxaxa
2211
22222121
11212111
Matdix Form
nnnnnn
n
n
b
bb
x
xx
aaa
aaaaaa
2
1
2
1
21
22221
11211
A x b
bAx
FEM3D-Part3 11
Solving Ax=b by LU Factorization1
2
3
LUA LU factorization of A
bLy Compute {y} (easy)
yUx
This {x} satisfies bAx
bLyLUxAx
Compute {x} (easy)
FEM3D-Part3 12
Forward Substitution:前進代入Solving Ly=b
bLy
nnnn b
bb
y
yy
ll
l
2
1
2
1
21
21
1
01001
nnnn byylyl
byylby
2211
22121
11
i
n
ininnnnn ylbylylby
ylbyby
1
12211
12122
11
row-by-row substitutio
FEM3D-Part3 13
Backward Substitution:後退代入Solving Ux=y
yUx
nnnn
n
n
y
yy
x
xx
u
uuuuu
2
1
2
1
222
11211
00
0
11212111
1,111,1
yxuxuxu
yxuxuyxu
nn
nnnnnnn
nnnn
112
111
1,1,111
/
/)(/
uxuyx
uxuyxuyx
j
n
ij
nnnnnnn
nnnn
row-by-row substitutio
FEM3D-Part3 14
Computation of LU Factorization
nn
n
n
n
nnnnnnnn
n
n
n
u
uuuuuuuuu
lll
lll
aaaa
aaaaaaaaaaaa
000
000
1
010010001
333
22322
1131211
321
3231
21
321
3333231
2232221
1131211
①
②
③
④
①
②
③
④
nnn uuuuauaua 112111112121111 ,,,,,,
131211111113131112121 ,,,,,, nnn lllulaulaula
nnnn uuuuulauula 223222121222122122 ,,,,,
242322232123132 ,,,, nlllulula
FEM3D-Part3 15
Example
44
3433
242322
14131211
434241
3231
21
00000
0
1010010001
17407822
107624321
uuuuuuuuuu
lllll
lA
1st row 14131211 4,3,2,1 uuuu
0/002/22,2/22
11411141
1131113111211121
ulululululul
21017,26
24241421
2323132122221221
uuuluuuluuul
24,12 42224212413222321231 lulullulul
1st col.
2nd row
2nd col.
FEM3D-Part3 16
Example (cont.)
44
3433
242322
14131211
434241
3231
21
00000
0
1010010001
17407822
107624321
uuuuuuuuuu
lllll
lA
17,38
343424321431
333323321331
uuululuuulul
37 43334323421341 lululul
21 4444344324421441 uuululul
Solving according to 1st row-column, 2nd row-column, 3rd row-column …
3rd row
3rd col
4th row/col
FEM3D-Part3 17
Example (cont.)
Finally:
2000130021204321
1320011200120001
17407822
107624321
A
L U
FEM3D-Part3 18
Example: 5-Point Stencil (FDM)五点差分
1
1 2 3
4 5 6
7 8 9
10 11 12
23
45
67
89
1011
12
FEM3D-Part3 19
Example: 5-Point Stencil (FDM)
1
1 2 3
4 5 6
7 8 9
10 11 12
23
4
67
89
1011
12
5
FEM3D-Part3 20
Coef. Matrix: Diag. Component=6.00
1 2 3
4 5 6
7 8 9
10 11 12
=X
12
34
56
78
910
1112
0.00
3.00
10.00
11.00
10.00
19.00
20.00
16.00
28.00
42.00
36.00
52.00
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -1.00 6.00 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -1.00 0.00 -1.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -1.00 0.00 -1.00 6.00 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -1.00 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 0.00 6.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00
FEM3D-Part3 21
Solution
1 2 3
4 5 6
7 8 9
10 11 12
=
12
34
56
78
910
1112
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -1.00 6.00 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -1.00 0.00 -1.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -1.00 0.00 -1.00 6.00 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -1.00 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 0.00 6.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
11.00
12.00
0.00
3.00
10.00
11.00
10.00
19.00
20.00
16.00
28.00
42.00
36.00
52.00
FEM3D-Part3 22
Complete LU Factorizationtype “./lu1”
Original Matrix
LU FactorizationBoth of [L] and [U] are shownDiag. of [L] are “1” (not shown)fill-in occurs: some of zero components became non-zero.
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 -0.17 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 -0.03 -0.17 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 -0.03 0.00 5.83 -1.03 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 -0.03 -0.18 5.64 -1.03 -0.18 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.64 -0.03 -0.18 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 -0.03 -0.01 5.82 -1.03 -0.01 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 -0.03 -0.18 5.63 -1.03 -0.18 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.63 -0.03 -0.18 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 -0.03 -0.01 5.82 -1.03 -0.01
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 -0.03 -0.18 5.63 -1.03
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.63
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -1.00 6.00 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-1.00 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -1.00 0.00 -1.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -1.00 0.00 -1.00 6.00 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -1.00 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 0.00 6.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 6.00
FEM3D-Part3 23
Incomp. LU fact. with no fill-in’stype “./lu2”
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 0.00 -0.17 5.66 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.65 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 -0.17 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 -0.03 -0.17 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 -0.03 0.00 5.83 -1.03 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 -0.03 -0.18 5.64 -1.03 -0.18 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.64 -0.03 -0.18 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 -0.03 -0.01 5.82 -1.03 -0.01 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 -0.03 -0.18 5.63 -1.03 -0.18 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.63 -0.03 -0.18 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 -0.03 -0.01 5.82 -1.03 -0.01
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 -0.03 -0.18 5.63 -1.03
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.63
LU FactorizationBoth of [L] and [U] are shownDiag. of [L] are “1” (not shown)fill-in occurs: some of zero components became non-zero.
Incomplete LU Factorization without fill-in’sBoth of [L] and [U] are shownDiag. of [L] are “1” (not shown)
FEM3D-Part3 24
Slightly “Inaccurate” SolutionIncompleteLU
CompleteLU
0.92
1.75
2.76
3.79
4.46
5.57
6.66
7.25
8.46
9.66
10.54
11.83
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
11.00
12.00
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 0.00 -0.17 5.66 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.65 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 -0.17 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 -0.03 -0.17 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 -0.03 0.00 5.83 -1.03 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 -0.03 -0.18 5.64 -1.03 -0.18 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.64 -0.03 -0.18 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 -0.03 -0.01 5.82 -1.03 -0.01 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 -0.03 -0.18 5.63 -1.03 -0.18 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.63 -0.03 -0.18 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 -0.03 -0.01 5.82 -1.03 -0.01
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 -0.03 -0.18 5.63 -1.03
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.63
FEM3D-Part3 25
ILU(0), IC(0)
• “Incomplete” factorization without fill-in’s– Reduced memory, computation
• Solving equations by ILU(0)/IC(0) factorization provides slightly “inaccurate” solution, although it’s not far from exact one.– “Accurateness” depends on problems (feature of
equations).
26
Full LU and ILU(0)/IC(0)
Full LUdo i= 2, n
do k= 1, i-1aik := aik/akkdo j= k+1, n
aij := aij - aik*akjenddo
enddoenddo
ILU(0) : keep non-zero pattern of the original coefficient matrix
do i= 2, ndo k= 1, i-1
if ((i,k)∈ NonZero(A)) thenaik := aik/akk
endifdo j= k+1, n
if ((i,j)∈ NonZero(A)) thenaij := aij - aik*akj
endifenddo
enddoenddo
FEM3D-Part3
27
Deep Fill-in: ILU(p)/IC(p)p: level of fill-in. If “p” increases, ILU(p)/IC(p) become closer to complete ILU/IC and provide more robust
preconditioners, but become more expensive: trade-off
LEVij=0 if ((i,j)∈ NonZero(A)) otherwise LEVij= p+1
do i= 2, ndo k= 1, i-1
if (LEVik≦p) thenaik := aik/akk
endifdo j= k+1, n
if (LEVij = min(LEVij,1+LEVik+ LEVkj)≦p) thenaij := aij - aik*akj
endifenddo
enddoenddo
FEM3D-Part3
FEM3D-Part3 28
LU Gauss-Seidel (LU-GS)LU Symmetric GS (LU-SGS)in this class ILU(0)
do i= 2, ndo k= 1, i-1
if ((i,k)∈ NonZero(A)) thenaik := aik/akk
endifdo j= k+1, n
if ((i,j)∈ NonZero(A)) thenaij := aij - aik*akj
endifenddo
enddoenddo
FEM3D-Part3 29
LU Gauss-Seidel (LU-GS)LU Symmetric GS (LU-SGS)in this class More Simplified Version of ILU(0)
do i= 2, ndo k= 1, i-1
if ((i,k)∈ NonZero(A)) thenaik := aik/akk
endifdo j= k+1, n
if ((i,j)∈ NonZero(A)) thenaij := aij - aik*akj
endifenddo
enddoenddo
Only do this
FEM3D-Part3 30
LU Gauss-Seidel (LU-GS)LU Symmetric GS (LU-SGS)in this class More Simplified Version of ILU(0)
nn
n
n
n
nn
n
n
n
a
aaaaaaaaa
u
uuuuuuuuu
000
000
000
000
333
22322
1131211
333
22322
1131211
1///
01//001/0001
1
010010001
321
33323331
2221
321
3231
21
nnnnnnnnnnnn aaaaaa
aaaaaa
lll
lll
FEM3D-Part3 31
ILU, LU-GStype “./lu3”
Incomplete LU Factorization without Fill-in’s
LU-GSwithout Fill-in’s
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 6.00 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 0.00 -0.17 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.17 6.00 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 0.00 6.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 0.00 -0.17 5.66 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.65 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65
FEM3D-Part3 32
Solution is more “inaccurate”ILU(0)
LU-GS 0.86
1.60
2.60
3.54
3.99
5.09
6.26
6.52
7.73
9.22
9.70
10.96
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 6.00 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 0.00 -0.17 6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.17 6.00 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 0.00 0.00 6.00 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 0.00 6.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 -0.17 6.00
6.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 5.83 0.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00
-0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00
0.00 -0.17 0.00 -0.17 5.66 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00
0.00 0.00 -0.17 0.00 -0.18 5.65 0.00 0.00 -1.00 0.00 0.00 0.00
0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00 -1.00 0.00 0.00
0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00 0.00 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65 0.00 0.00 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 -0.17 0.00 0.00 5.83 -1.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.17 5.65 -1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.18 0.00 -0.18 5.65
0.92
1.75
2.76
3.79
4.46
5.57
6.66
7.25
8.46
9.66
10.54
11.83
FEM3D-Part3 33
Forward/Backward Substitution in LU-GS
rULz1~~
rzULzM ~~
ryL ~
yzU ~
nn
n
n
n
nn
n
n
n
a
aaaaaaaaa
u
uuuuuuuuu
U
000
000
000
000
~333
22322
1131211
333
22322
1131211
1///
01//001/0001
1
010010001
~
321
33323331
2221
321
3231
21
nnnnnnnnnnnn aaaaaa
aaaaaa
lll
lll
L
FEM3D-Part3 34
0
000000000
321
3231
21
nnn aaa
aaa
L
0000
00000
0
3
223
11312
n
n
n
aaaaaa
U
nna
aa
a
D
000
000000000
33
22
11
FEM3D-Part3 35
UDIDL
UDIDLUDDDLULM1
11~~
Uaaaaaaaaaaaa
UDI n
n
n
~
1000
/100//10///1
333
2222223
11111131112
1
L
aaaa
aaaaa
a
a
aa
a
aaa
aaa
DL
nnnnnnnnnn
~000000
000
000000000
0
000000000
321
333231
2221
11
33
22
11
321
3231
21
FEM3D-Part3 36
Forward/Backward Subst. in LU-GS
Forward Substitution
Backward Substitution
1
1
11i
jjijiiii yLrDyyLrDyryDL
j
N
ijijiiii zUDyzzUDyzyzUDI
1
111
UDIDLUDIDLUDDDLULM 111~~
Uaaaaaaaaaaaa
UDI n
n
n
~
1000
/100//10///1
333
2222223
11111131112
1
L
aaaa
aaaaa
a
a
aa
a
aaa
aaa
DL
nnnnnnnnnn
~000000
000
000000000
0
000000000
321
333231
2221
11
33
22
11
321
3231
21
FEM3D-Part3 37
Forward/Backward Subst. in LU-GS!C!C +----------------+!C | {z}= [Minv]{r} |!C +----------------+!C===
do i= 1, NW(i,Z)= W(i,R)
enddo
do i= 1, NWVAL= W(i,Z)do k= indexL(i-1)+1, indexL(i)
WVAL= WVAL - AL(k) * W(itemL(k),Z)enddoW(i,Z)= WVAL / D(i)
enddo
do i= N, 1, -1SW = 0.0d0do k= indexU(i), indexU(i-1)+1, -1
SW= SW + AU(k) * W(itemU(k),Z)enddoW(i,Z)= W(i,Z) – SW / D(i)
enddo!C===
1
1
1i
jjijiiii zLzDz
j
N
ijijiiii zUDzz
1
1
1
1
i
jjiji zLzWVAL
j
N
ijij zU
1
SW
zzL ~
zzU ~
38
Parallel Preconditioning Method using MPI
Parallel FEM 3D-2 39
SGS/SSOR: Global Operations (Forward/Backward Substitution) NOT suitable for parallel
computing
Localized SGS/SSORPreconditioning
rzL
zzU
!C!C +----------------+!C | {z}= [Minv]{r} |!C +----------------+!C===
do i= 1, NW(i,Z)= W(i,R)
enddo
do i= 1, NWVAL= W(i,Z)do k= indexL(i-1)+1, indexL(i)WVAL= WVAL - AL(k) * W(itemL(k),Z)
enddoW(i,Z)= WVAL / D(i)
enddo
do i= N, 1, -1SW = 0.0d0do k= indexU(i), indexU(i-1)+1, -1SW= SW + AU(k) * W(itemU(k),Z)
enddoW(i,Z)= W(i,Z) – SW / D(i)
enddo!C===
Ignoring effects of external points for preconditioning Block-Jacobi Localized
Preconditioning WEAKER than original
SGS/SSOR More PE’s, more iterations
Parallel FEM 3D-2 40
Localized SGS/SSORPreconditioning
A1 2 3 4 5 6PE#1
PE#2
PE#3
PE#4
1 2 3
4 5 6
Considered :
Ignored :
Parallel FEM 3D-2 4141
Overlapped Additive Schwartz Domain Decomposition MethodStabilization of Localized Preconditioning: ASDD
Global Operation
Local Operation
Global Nesting Correction: Repeating -> Stable
1 2
rMz
222111
11 ,
rMzrMz
)( 111111111111
nnnn zMzMrMzz
)( 111122222222
nnnn zMzMrMzz
Parallel FEM 3D-2 4242
Overlapped Additive Schwartz Domain Decomposition MethodStabilization of Localized Preconditioning: ASDDGlobal Nesting Correction: Repeating -> Stable
1 2
)( 111111111111
nnnn zMzMrMzz
)( 111122222222
nnnn zMzMrMzz
)( 11111111111111111111
nnnnnn zMzMrMzrMzzzz
11111111
nn zMzMrr
11111111
nn zzzwhererMz
Parallel FEM 3D-2 4343
Overlapped Additive Schwartz Domain Decomposition MethodEffect of additive Schwartz domain decomposition for solid mechanics example example with 3x443 DOF on Hitachi SR2201, Number of ASDD cycle/iteration= 1, = 10-8
PE # Iter. # Sec. Speed Up Iter.# Sec. Speed Up1 204 233.7 - 144 325.6 -2 253 143.6 1.63 144 163.1 1.994 259 74.3 3.15 145 82.4 3.958 264 36.8 6.36 146 39.7 8.21
16 262 17.4 13.52 144 18.7 17.3332 268 9.6 24.24 147 10.2 31.8064 274 6.6 35.68 150 6.5 50.07
NO Additive Schwartz WITH Additive Schwartz