FreeFem++ workshop 2016, 08 Dec.
Direct solver anddomain decomposition preconditionerfor indefinite finite element matrices
Atsushi Suzuki1
1Cybermedia Center, Osaka [email protected]
http://www.ljll.math.upmc.fr/∼suzukia/
outlines
I examples of indefinite finite element stiffness matrixsparse symmetric/unsymmetric, singular
I overview of sparse direct solverI pivoting strategy to solve indefinite and/or singular matrixI parallel efficiency of matrix from 3D N-S eqs. on super-scalar
and vector CPUsI coarse space to accelerate convergence of iterative solverI conclusion
indefinite and singular matrix in solving PDE
2D stationary cavity driven flow problem in (0, 1)× (0, 1)µ : viscosity coefficient−2µ∇ ·D(u) + u · ∇u +∇p = 0 in Ω,
∇ · u = 0 in Ω,
u = g on ∂Ω.
g = [4x(1− x) 0]T on the top,g = 0 elsewhere.(u, p) : sol. ⇒ (u, p + 1) : sol.
Newton iteration to solve nonlinear system
K
[~un
~pn
]=
[A(~un−1) BT
B 0
] [~un
~pn
]=
[~fn
~0
]KerK =
[~0~1
]find u ∈ H1(Ω);u = g on ∂Ω, p ∈ L2
0(Ω) is not implemented.L2
0(Ω) = p ∈ L2(Ω) ;∫Ω
p = 0I fixing one point for pressureI penalization for pressureI matrix solver detects the kernel
indefinite matrix from electro-magnetic problem
constraints on the external force: ∇ · f = 0 in Ω ⊂ R3.
∇× (∇× u) = f in Ω,
∇ · u = 0 in Ω,
u× n = 0 on ∂Ω.
H0(curl ; Ω) = u ∈ L2(Ω)3 ; ∇× u ∈ L2(Ω)3 ; u× n = 0
find (u, p) ∈ H0(curl ; Ω)×H10 (Ω)
(∇× u,∇× v) + (v,∇p) = (f, v) ∀v ∈ H0(curl ; Ω)
(u,∇q) = 0 ∀q ∈ H10 (Ω)
has a unique solution.I (∇× ·,∇× ·) : coercive on W ,
W = H0(curl ; Ω) ∩ u ∈ H(div; Ω) ; div u = 0.I stiffness matrix is symmetric but indefinite.I H0(curl ; Ω) = gradH1
0 (Ω)⊕W .
indefinte and singular from electro-magnetic problem
constraints on the external force: ∇ · f = 0 in Ω.
∇× (∇× u) = f in Ω,
∇ · u = 0 in Ω,
(∇× u)× n = 0 on ∂Ω,
u · n = 0 on ∂Ω.
L20(Ω) = p ∈ L2(Ω) ; (p, 1) = 0
find (u, p) ∈ H(curl ; Ω)× H1(Ω) ∩ L20(Ω)
(∇× u,∇× v) + (v,∇p) = (f, v) ∀v ∈ H(curl ; Ω)
(u,∇q) = 0 ∀q ∈ H1(Ω) ∩ L20(Ω)
finite element approximation : Nédélec element of degree 0 and P1
N0(K) = (P0(K))3 ⊕ [x× (P0(K))3], P1(K)
ker[A BT
B 0
]=
[~0~1
], ∃A−1 on kerB
not easy problem for usual direct solvers
semi-conductor problem with Drift-Diffusion model : 1/3
hole concentration p : unknownpotential ϕ : given log(ni/nd) in N-region, log(na/ni) in P.
−div(∇p + p∇ϕ) = 0 in Ωp = g on ΓD
∂νp = 0 on ΓN
following Maxwell-Boltzman statistics : p = niexp(ϕp − ϕ
Vth)
I ϕp : quasi-Fermi levelI ni : intrinsic concentration of the semiconductorI Vth = KBT/q : thermal voltageI KB : Boltzmann constantI q : positive electron chargeI T : lattice temperature
semi-conductor problem with Drift-Diffusion model : 1/3
hole concentration p : unknownpotential ϕ : given log(ni/nd) in N-region, log(na/ni) in P.
−div(∇p + p∇ϕ) = 0 in Ωp = g on ΓD
∂νp = 0 on ΓN
p = ni/nd ∂νp = 0
p = na/ni
∂νp = 0
semi-conductor problem with Drift-Diffusion model : 2/3
Slotboom variable ξ : p = ξe−ϕ −Jp = ∇p + p∇ϕ = ∇ξe−ϕ
−div(−Jp) = 0 in Ω−Jpe
ϕ = ∇ξ in Ω
function space : H(div) = τ ∈ L2(Ω)2 ; div τ ∈ L2(Ω),Σ = τ ∈ H(div) ; τ · ν = 0 on ΓN
integration by parts leads to
−∫
Ω
eϕJp · τ =∫
Ω
∇ξ · τ = −∫
Ω
ξ∇ · τ +∫
∂Ω
ξτ · ν
F. Brezzi, L. D. Marini, S. Micheletti, P. Pietra, R. Sacco, S. Wang.Discretization of semiconductor device problems (I) F. Brezzi et al.,Handbook of Numerical Anasysis vol XIII, Elsevier 2005hybridization of mixed formulation + mass lumping ⇒ FVM
mixed formulation + higher order approximation ⇒ indefinite matrix
semi-conductor problem with Drift-Diffusion model : 2/3
mixed-type weak formulation
find (Jp, ξ) ∈ Σ× L2(Ω)∫Ω
eϕJp · τ −∫
Ω
ξ∇ · τ =−∫
ΓD
geϕτ · ν ∀τ ∈ Σ∫Ω
∇ · Jpv = 0 ∀v ∈ L2(Ω)
symmetric indefintereplacing ξ = eϕp again,
find (Jp, p) ∈ Σ× L2(Ω)∫Ω
eϕJp · τ −∫
Ω
eϕp∇ · τ =−∫
ΓD
geϕτ · ν ∀τ ∈ Σ∫Ω
∇ · Jpv = 0 ∀v ∈ L2(Ω)
unsymmetic indefintie cf. exponential fitting with FVM
Ravier-Thomas element for H(div) RT1(K) = (P1(K))2 + ~xP1(K),picewsie linear element for L2(Ω) P1(K).
abstract framework
V : Hilbert space with inner product (·, ·) and norm || · ||.bilinear form a(·, ·) : V × V → R
I continuous : ∃γ > 0 |a(u, v)| ≤ γ||u|| ||v|| ∀u, v ∈ V .
I ∃α1 > 0 supv∈V,v 6=0
a(u, v)||v||
≥ α1||u|| ∀u ∈ V .
I ∃α2 > 0 supu∈V,u 6=0
a(u, v)||u||
≥ α2||v|| ∀v ∈ V .
find u ∈ V s.t. a(u, v) = F (v) ∀v ∈ V has a unique solution.
∀U ⊂ V subspace
find u ∈ U s.t. a(u, v) = F (v) ∀v ∈ U
in general, inf-sup condition in subspace U is unclear.in discretized problem : Vh ⊂ V ?in linear solver (subspace of Vh) ?
State of the art : software for sparse direct solverSoftware parallel elimination data pivoting kernel
env. strategy manag. detectionUMFPACK —- multi-frontal static yes no
SuperLU_MT shared super-nodal dynamic yes noPardiso shared super-nodal dynamic yes +
√ε-p. no
SuperLU_DIST distributed super-nodal static no,√
ε-p. noMUMPS distributed multi-frontal dynamic yes yes
Dissection shared multi-frontal static yes yes
T. A. Davis, I. S. Duff. A combined unifrontal/multifrontal method for unsymmetricsparse matrices,ACM Trans. Math. Software, 25 (1999), 1–20.J. W. Demmel, S. C. Eisenstat, J. R. Gilbert, X. S. Li, J. W. H. Liu.A supernodal approach to sparse partial pivoting,SIAM J. Matrix Anal. Appl., 20 (1999), 720–755.O. Schenk, K. Gärtner. Solving unsymmetric sparse systems of liner equationswith PARDISO,Future Generation of Computer Systems, 20 (2004), 475–487.X. S. Li, J. W. Demmel. SuperLU_DIST : A scalable distributed-memory sparsedirect solver for unsymmetric linear systems,ACM Trans. Math. Software, 29 (2003), 110–140.P. R. Amestoy, I. S. Duff, J.-Y. L’Execellent. Mutlifrontal parallel distributedsymmetric and unsymmetric solvers,Comput. Methods Appl. Mech. and Engrg, 184 (2000) 501–520.A. Suzuki, F.-X. Roux, A dissection solver with kernel detection for symmetricfinite element matrices on shared memory computers,Int. J. Numer. Meth. in Engng, 100 (2014) 136–164.
ordering of sparse matrix
sparse matrix needs to be re-orderedI to reduce fill-inI to increase parallelization of factorization → multi-frontI to increase size of block structure → supernode
example:7 stencil of Poisson equation, 113 nodes.
original matrix reverse Cuthill-McKee nested-dissection(3 layers)
nested-dissection by graph decomposition
8 94
c
d
6
a b5 e f7
2 31
8 9
4
a b
5
c d
6
e f
7
2 3
1
sparse solver
dense solver
dense solver
dense solver
A. George. Numerical experiments using dissection methods to solve n by n gridproblems. SIAM J. Num. Anal. 14 (1977),161–179.software package:METIS : V. Kumar, G. Karypis, A fast and high quality multilevel scheme forpartitioning irregular graphs. SIAM J. Sci. Comput. 20 (1998) 359–392.SCOTCH : F. Pellegrini J. Roman J, P. Amestoy, Hybridizing nested dissectionand halo approximate minimum degree for efficient sparse matrix ordering.Concurrency: Pract. Exper. 12 (2000) 69–84.
I each leaf can be computed in parallel ⇐ multi-frontI load unbalance? because of different size of subdomainsI parallel computation of higher levels?
# cores > # subdomains
recursive generation of Schur complement»A11 A21
A21 A22
–=
»A11 0
A21A−111 S22
– »I1 A−1
11 A12
0 I2
–S22 = A22 −A21A−1
11 A12 = A22 − (A21U−111 )D−1
11 L−111 A12 : recursively computed
8 9 a b c d e f 4 5 6 7 2 3 1
88
99
aa
bb
cc
dd
ee
ff
84
94
a5
b5
c6
d6
e7
f7
82
92
a2
b2
d3
e3
f3
91
b1
c1
d1
e1
44
55
66
77
42
52
73
61
22
33
11
21
31
48 49
5a 5b
6c 6d
7e 7f
28 29 2a 2b
3d 3e 3f
1b 1c 1d 1e
24 25
16
37
12 1319
42
52
73
63
22
33
11
21
31
24 25
36 37
12 13
22
33
11
21
31
12 13
41
51
61
71
14 15 16 17
Schur complement
by
Schur complement
by
11
Schur complement
by
sparse solver
dense solver
dense solver
44
55
66
77
dense factorization
sparse part : completely in paralleldense part : better use of BLAS 3; dgemm, dtrsm
pivoting strategy
full pivoting : A = ΠTLLUΠR partial pivoting : A = ΠLU
find maxk<i,j≤n|A(i, j)| find maxk<i≤n|A(i, k)|
k k
symmetric pivoting : A = ΠT LDUΠ 2× 2 pivoting : A = ΠT L D UΠ
find maxk<i≤n|A(i, i)| find maxk<i,j≤ndet∣∣∣∣A(i, i) A(i, j)A(j, i) A(j, j)
∣∣∣∣k k
sym. pivoting is mathematically not always possible → 2× 2 pivoting
understanding pivoting strategy by solution in subspaces
A = ΠT LDUΠ: symmetric pivotingD: diagonal, L: lower triangle, L(i, i) = 1, U : upper tri., U(i, i) = 1.
I index set i1, i2, · · · , imI Vm = span[~ei1 , ~ei2 , · · · , ~eim ] ⊂ RN
I Pm : RN → Vm orthogonal projection.
find ~u ∈ Vm (A~u− ~f,~v) = 0 ∀v ∈ Vm.
∃Π : A = ΠT LD UΠ⇒ ∃i1, i2, · · · , iN s.t. PmA PT
m : invertible on Vm 1 ≤ ∀m ≤ N .
2× 2 pivoting: Vm−1, Vm, Vm+1, Vm+2, Vm+3, by skipping Vm+1.
J. R. Bunch, L. Kaufman. Some stable methods for calculating inertiaand solving symmetric linear systems,Math. Comput, 31 (1977) 163–179.R. Bank, T.-F. Chan. An analysis of the composite step biconjugategradient method.Numer. Math, 66 (1993) 295–320.
symmetric pivoting with postponing for block strategyI nested-dissection decomposition may produce singular
sub-matrix for indefinite matrix
τ : given threshold for null pivot|A(i, i)|/|A(i− 1, i− 1)| < τ ⇒ |A(i, i)| is null pivot.
i − 1i
44
55
66
77
22
33
11
31
21
71
61
51
41
73
63
52
42
invertible entries
candidates ofnull pivot
Schur complement matrix from suspicious (postponed) null pivotsand additional nodes ⇒ kernel detection algorithm
kernel detection (rank deficient problem)[A11 A12
A21 A22
]=
[A11 0A21 S22
] [I1 A−1
11 A12
0 I2
]S22 = 0 ⇒ KerA =
[A−1
11 A12
−I2
]symmetric semi-positive definite, m + k = 4 + 6 = 10by Householder-QR factorization:
4.60e-02 -1.20e-02 2.91e-03 1.16e-02 2.24e-02 -9.33e-05 -3.60e-02 8.22e-03 -7.77e-03 -2.90e-020.0 3.84e-02 4.84e-03 -2.21e-02 1.87e-02 1.30e-03 -9.14e-03 -2.74e-02 1.48e-02 -1.91e-020.0 0.0 2.96e-02 1.68e-03 -2.55e-02 1.11e-04 1.28e-02 -1.04e-04 -1.12e-03 -1.20e-020.0 0.0 0.0 1.28e-02 -1.66e-03 8.48e-04 -4.29e-05 -7.90e-04 -8.56e-03 2.51e-030.0 0.0 0.0 0.0 1.23e-11 -5.49e-13 -8.30e-12 1.67e-13 2.10e-14 -7.08e-120.0 0.0 0.0 0.0 0.0 6.70e-13 -1.02e-13 -5.33e-13 -1.62e-13 1.18e-130.0 0.0 0.0 0.0 0.0 0.0 3.33e-13 -2.48e-14 -6.61e-14 -3.18e-130.0 0.0 0.0 0.0 0.0 0.0 0.0 1.22e-13 4.46e-15 -4.34e-140.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.05e-14 8.16e-150.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.09e-14
how to set threshold to distinguish betweennon-zero (1.28e-02) and zero (1.23e-11) values ?
Pardiso no capability of kernel detection.MUMPS user has to choose this value.Dissection + an algorithm by measuring dimension of residual of
matrix with a projection onto the image space.
kernel detection algorithm based on LDU : 3/4
A : N ×N unsymmetric, dimKerA = k ≥ 1, dimImA ≥ m.two parameters: l, n, which define size of factorization,
N − n ln l
»A11 A12
A21 A22
–=
»A11 0A21 S22
– »I1 A−1
11 A12
0 I2
– gImn = span» gA−1
11 A12
−I2
–⊥.
I projection : P⊥n : RN → gImn
I solution in subspace, eA†N−lb =
» gA−111 b1
0
–, b =
»b1
b2
–lN − ll l
Theoretically ¬A−1N−k+1
perturbed solution with machine epsilon of double precision ε0
gA11
−1b1 = U−1
11 D−111 L−1
11 b1 + ε0em
err(n)l := max
˘max
x=[0 xl]6=0
||P⊥n ( eA†
N−lA x− x)||||x|| , max
x=[xN−l 0]6=0
|| eA†N−lA x− x||||x||
¯n = k + 1 ⇔ err(k+1)
k ≈ 0 ∧ err(k+1)k+1 ≈ 0 ∧ err(k+1)
k+2 ∼ 1
n = k ⇔ err(k)k−1 0 ∧ err(k)
k ≈ 0 ∧ err(k)k+1 ∼ 1
n = k − 1 ⇔ err(k−1)k−2 0 ∧ err(k−1)
k−1 0 ∧ err(k−1)k ∼ 1
Exchange of 1× 1 and 2× 2 pivot entries
B =
1l2 1l3 0 1
d1
d2 d0
d0 d3
1 u2 u3
1 01
=
d1 d1u2 d1u3
d1l2 d2 + d1l2u2 d0 + d1l2u3
d1l3 d0 + d1l3u2 d3 + d1l3u3
find (i, j) : |bii · bjj − bjibij | ≥ |bkk · bmm − bmkbkm| for 2× 2 block
(i, j, h), (k, m , n) ∈ (1, 2, 3), (2, 3, 1), (3, 1, 2) .
permutation Π(1, 2, 3) = i, j, h,
Π B ΠT =
10 1l′1 l′2 1
d′1 d′0d′0 d′2
d′3
1 0 l′11 l′2
1
.
d2 6= 0 ⇒∣∣∣∣ d1 d1u2
d1l2 d2 + d1l2u2
∣∣∣∣ = d1 · (d2 + d1l2u2)− (d1l2)(d2l2) = d1d2 6= 0 .
d2 = 0 ∧ d3 = 0 ⇒ d0 6= 0,∣∣∣∣ d1l2u2 d0 + d1l2u3
d0 + d1l3u2 d1l3u3
∣∣∣∣ = d1l2u3 · d1l3u2−(d0+d1l2u3)(d0+d1l3u2) 6=0 .
d−4d−3d−2(d−1d0)(d1d2)→d−4d−3(d′−2d
′−1)d
′0(d1d2)→(d′−4d
′′−3)(d
′′′′−2d
′′′′−1)d
′′′′0 d′′1d′2 .
Kernel detection algorithm assumes dimImA ≥ m ≥ 4.
example of kernel detection algorithmstationary Navier-Stokes equations, Re = 12, 800, N = 43, 998,τ = 10−2, (m = 4) + 1 + 1 : kernel ∼ pressure ambiguity6× 6 matrix by Householder QR factorization
4.221911e-2 5.065337e-3 5.137137e-3 1.493815e-3 3.874611e-2 1.218166e-24.060156e-2 3.228548e-2 5.466190e-3 2.174984e-3 6.749120e-4
1.389616e-2 6.729308e-3 8.980537e-3 1.813681e-31.708745e-3 1.640027e-15 6.871814e-1
1.203270e-15 1.788546e-131.674888e-16
computed residuals with orthogonal projection:k err(k)
k−1 err(k)k err(k)
k+1
2 1.57098143 · 10−3 2.69712366 · 10−16 8.11624415 · 10−1
β1 2.22044604 · 10−16
β4 8.88178419 · 10−16
β6 1.50415172 · 10−3
γ0 1.15583524 · 10−9
residual of kernel vectors:dim. of kernel = 1 dim. of kernel = 22.23779349 · 10−15 1.92578081 · 10−3
1.84833445 · 10−3
stiffness matrix of electro-magnetic equations by FreeFem++
FreeFem++ script
load "msh3"load "Dissection"defaulttoDissection;mesh3 Th=cube(20,20,20);fespace VQh(Th, [Edge03d, P1]); // Nedelec elementVQh [u1, u2, u3, p], [v1, v2, v3, q];varf aa([u1, u2, u3, p], [v1, v2, v3, q]) =int3d(Th)((dy(u3)-dz(u2)) * (dy(v3) - dz(v2)) +
(dz(u1)-dx(u3)) * (dz(v1) - dx(v3)) +(dx(u2)-dy(u1)) * (dx(v2) - dy(v1)) +dx(p) * v1 + dy(p) * v2 + dz(p) * v3 +dx(q) * u1 + dy(q) * u2 + dz(q) * u3);
matrix A = aa(VQh, VQh, solver=sparsesolver,tolpivot=1.0e-2,strategy=102);
solver elapsed time (sec). algebraic errorUMFPACK 32.348 3.55790
Intel Pardiso 9.698 4.07409× 10−7
Dissection 10.534 5.89406× 10−15
parallel performance on Xeon [email protected] 14cores ×2
100
1000
1 10
time(
sec.
)
# cores
Dissection
MUMPSIntel Pardiso
O(1/p)
unsymmetric matrix N = 1, 032, 183, nnz = 97, 961, 089, dim ker = 1.from 3D Navier-Stokes eqs., P2/P1, h=1/35, Re=300. 57GB mem.
parallel performance on Xeon v3
n = 945, 164, nnz = 89, 588, 848.36.8GFlop/s × 28 cores, 64GB shared memory
purple yellow light green light blue dark bluesparse LDU spase Schur DTRSM DGEMM dense LDU# of cores CPU time (sec.) elapsed (sec.) GFlop/s of DGEMM
1 1,268.0 1,268.9 36.352 1,108.3 659.39 36.274 1,178.5 356.22 34.068 1,469.2 201.24 31.2316 1,813.2 129.63 25.2528 2,002.0 94.43 22.90
parallel performance on NEC SX-ACE
n = 945, 164, nnz = 89, 588, 848.64GFlop/s × 4 cores, 64GB shared memory
purple yellow light green light blue dark bluesparse LDU spase Schur DTRSM DGEMM dense LDU
# of cores CPU time (sec.) elapsed (sec.) GFlop/s of DGEMM1 1,080.4 1,081.9 44.852 1,108.3 590.96 43.764 1,178.5 345.84 41.31
Additive Schwarz preconditioner for 3D computation : 1/4
Rp : overlapping decomposition, Dp : a partition of unity (discrete)∑Mp=1R
Tp DpRp = IN ,
coarse space by Nicolaides~zp ⊂ RN : basis of coarse space, Z = [~z1, · · · , ~zM ], R0 = ZT .
~zp = RTp DpRp
~1,
2-level ASM preconditioner
Q−1ASM,2 = RT
0 (R0ART0 )−1R0 +
M∑p=1
RTp (RpART
p )−1Rp
hybrid version of 2-level ASM preconditioner
Q0 = RT0 (R0ART
0 )−1R0, P0 = I −Q0A
Q−1ASM,hybrid = Q0 + PT
0
M∑p=1
RTp (RpART
p )−1RpP0
cf. V. Dolean, P Jolivet, F. Nataf, An Introduction to Domain Decomposition Methods –Algorithms, Theory, and Parallel Implementation, SIAM, 2015
Additive Schwarz preconditioner for 3D computation : 2/4
the stiffness matrix restricted on the coarse spaceR0ART
0 is invertible for indefinite problem ?Stokes eqs : coarse space ⇐ rigid body modes + pressure constant
penalty-type stabilized finite element methodVh ⊂ V : P1 finite elementQh ⊂ Q : P1 finite element +
∫Ω
ph dx = 0.Find (uh, ph) ∈ Vh(g)×Qh s.t.
a(uh, vh) + b(vh, ph) = (f, vh) ∀vh ∈ Vh,
b(uh, qh)− δd(ph, qh) = 0 ∀qh ∈ Qh.
δ > 0 : stability parameter, d(ph, qh) =∑K∈T
h2K
∫K
∇ph · ∇qh dx.
|ph|2h = d(ph, ph) : mesh dependent norm on Qh.I uniform weak inf-sup condition : Franca-Stenberg [1991]
∃β0, β1 > 0 ∀h > 0 supvh∈Vh
b(vh, qh)||vh||1
≥ β0||qh||0 − β1|qh|h ∀ qh ∈ Q0.
Additive Schwarz preconditioner for 3D computation : 3/4
matrix formulation of the stabilized FEM for the Stokes eqs.V = RNv , Q ⊂ RNp , (~q,~1) = 0 for ~q ∈ Q.find (~u, ~p) ∈ V ×Q s.t.([
A BT
B −δ D
] [~u~p
]−
[~f~0
],
[~v~q
])= 0 ∀(~v, ~q) ∈ V ×Q
subspace : U ×R, U ⊂ V , R ⊂ Q
⇒[A BT
B −δ D
]is invertible on U ×R.
proof([A BT
B −δ D
] [~u~p
],
[~v~q
])= 0 ∀(~v, ~q ) ∈ U ×R ⇒ ~u = ~0, ~p = ~0.
( ~u, ~p ) ∈ U ×R ⇒ ( ~u, −~p ) ∈ U ×R.
([A BT
B −δ D
] [~u~p
],
[~u−~p
])= (A~u, ~u ) + δ( D~p, ~p ) > 0.
cf. S, Profeedings of ALGORITMY 2009
Additive Schwarz preconditioner for 3D computation : 4/450×50×50 FEM nodes, overlap = 1
p=8 16 32
1e-12
1e-10
1e-08
1e-06
0.0001
0.01
1
0 20 40 60 80 100 120 140
rela
tive
resi
dual
iteration
diagonal scaling8 subdomains
16 subdomains32 subdomains
R0ART0 : 7p× 7p
dimker(R0ART0 ) = 1
is detected byDissection.
conclusion
I indefinite matrix is factorized by postponing strategy forsuspicious null pivots
I combination of 1x1 and 2x2 pivoting can factorize finite elementmatrices without adding perturbation
I new kernel detection algorithm resolves rank deficient problemfrom FEM matrices
I 1M DOF is factorized with 57GB memory on a shared memorycomputer within 2 minutes
I direct solver is efficiently used as sub-domain solverI stabilized term for the Stokes equations ensures solvability of the
coarse problem