Sparse least squares problems with dense rows andpreconditioned iterative methods.
Jennifer Scott
University of Reading and STFC Rutherford Appleton Laboratory
Miroslav Tůma
Faculty of Mathematics and Physics, Charles University, Prague
Padua, February 2020
1 / 64
Outline
1 Introduction
2 Possible research directions
3 Some historical comments
4 Arbitrary sparse-dense (ASD) approach
5 Sparse stretching
6 Schur complement approach
7 Null sparse approach
8 Conclusions
2 / 64
Introduction
The Linear Least Squares Problem (LS):
minx∈IRn
∥Ax − b∥2,
where A ∈ IRm×n with m ≥ n is large and sparse, b ∈ IRm
3 / 64
Introduction
The Linear Least Squares Problem (LS):
minx∈IRn
∥Ax − b∥2,
where A ∈ IRm×n with m ≥ n is large and sparse, b ∈ IRm
Interested in solving the problem by preconditioned iterations
3 / 64
Introduction
The Linear Least Squares Problem (LS):
minx∈IRn
∥Ax − b∥2,
where A ∈ IRm×n with m ≥ n is large and sparse, b ∈ IRm
Interested in solving the problem by preconditioned iterations
Many obstacles on the way to construct efficient and generalpreconditioners
▸ Enormous variability of LS problems (and we like to consideralgebraic problem)
▸ The sparsity structure of AT A is always behind the scene in theCholesky/QR approaches.
3 / 64
Introduction
QR is always behind the scene even when the normal equations are notformed.
4 / 64
Introduction
QR is always behind the scene even when the normal equations are notformed.
Undergraduate stuff:A = QR
AT A = RT QT QR
4 / 64
Introduction
QR is always behind the scene even when the normal equations are notformed.
Undergraduate stuff:A = QR
AT A = RT QT QR
The fill is exactly as predicted by Cholesky of AT A if A has thestrong Hall property
4 / 64
Introduction
What if A contains a small but a problematic part?
5 / 64
Introduction
What if A contains a small but a problematic part?
An example of such a trivial problematic part is a chunk of dense rows:
A
A AT
5 / 64
Introduction
What if A contains a small but a problematic part?
An example of such a trivial problematic part is a chunk of dense rows:
A
A AT
Dense rows may represent a standard global coupling on the top oflocal sparse relations and we need to treat them in the same way asthe other constraints.
5 / 64
Introduction
Sometimes, some constraints have to be satisfied with a higheraccuracy than the rest: linearly constrained problem (LEP).
minx∈IRn
∥Ax − b∥2, Ex = f
These constraint rows (Ex = f) are often less sparse than the otherrows.
6 / 64
Introduction
Sometimes, some constraints have to be satisfied with a higheraccuracy than the rest: linearly constrained problem (LEP).
minx∈IRn
∥Ax − b∥2, Ex = f
These constraint rows (Ex = f) are often less sparse than the otherrows.
Dense rows may appear due to transformation/decomposition. Forexample, after projecting the problem onto a subspace of givenconstraints and trying to express the transformation explicitly:
Ex = f → finding Z such that EZ = 0→ ZT AZ
In some cases, Z is known to contain dense rows (Arioli, Maryška,Rozložník, T., 2004.)
6 / 64
Introduction
Sometimes, some constraints have to be satisfied with a higheraccuracy than the rest: linearly constrained problem (LEP).
minx∈IRn
∥Ax − b∥2, Ex = f
These constraint rows (Ex = f) are often less sparse than the otherrows.
Dense rows may appear due to transformation/decomposition. Forexample, after projecting the problem onto a subspace of givenconstraints and trying to express the transformation explicitly:
Ex = f → finding Z such that EZ = 0→ ZT AZ
In some cases, Z is known to contain dense rows (Arioli, Maryška,Rozložník, T., 2004.)
Schur complement systems look like weighted normal equations . . .
6 / 64
Introduction
But not only dense rows are the problematic part in incompletefactorizations: the problem is more general:
▸ Rank-1 (rank-k) modifications of (approximate) factorizations fromsome rows of AT A (Tismenetsky (1991); Kaporin (1998); Scott, T.(2014)) may generate dense contributions into the Schur complement.
Before the update
7 / 64
Introduction
But not only dense rows are the problematic part in incompletefactorizations: the problem is more general:
▸ Rank-1 (rank-k) modifications of (approximate) factorizations fromsome rows of AT A (Tismenetsky (1991); Kaporin (1998); Scott, T.(2014)) may generate dense contributions into the Schur complement.
After the update
8 / 64
Introduction
But not only dense rows are the problematic part in incompletefactorizations: the problem is more general:
▸ Rank-1 (rank-k) modifications of (approximate) factorizations fromsome rows of AT A (Tismenetsky (1991); Kaporin (1998); Scott, T.(2014)) may generate dense contributions into the Schur complement.
After the update
How to make AT A to minimize this effect?8 / 64
Introduction
But not only dense rows are the problematic part in incompletefactorizations: the problem is more general:
▸ Rank-1 (rank-k) modifications of (approximate) factorizations fromsome rows of AT A (Tismenetsky (1991); Kaporin (1998); Scott, T.(2014)) may generate dense contributions into the Schur complement.
After the update
How to make AT A to minimize this effect? Split.
8 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)▸ If rank(A) > rank(sparse_part_of_A): specific modifications
needed.
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)▸ If rank(A) > rank(sparse_part_of_A): specific modifications
needed.▸ We rely on an implicit combination of the sparse and dense part of the
approximate inverse coupled together inside CG to get
z =M−1r.
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)▸ If rank(A) > rank(sparse_part_of_A): specific modifications
needed.▸ We rely on an implicit combination of the sparse and dense part of the
approximate inverse coupled together inside CG to get
z =M−1r.
2 Transforming dense to sparse at the expense of getting the problemlarger.
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)▸ If rank(A) > rank(sparse_part_of_A): specific modifications
needed.▸ We rely on an implicit combination of the sparse and dense part of the
approximate inverse coupled together inside CG to get
z =M−1r.
2 Transforming dense to sparse at the expense of getting the problemlarger.
▸ Sparsifying the dense part by matrix stretching (Scott, T., 2019)
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)▸ If rank(A) > rank(sparse_part_of_A): specific modifications
needed.▸ We rely on an implicit combination of the sparse and dense part of the
approximate inverse coupled together inside CG to get
z =M−1r.
2 Transforming dense to sparse at the expense of getting the problemlarger.
▸ Sparsifying the dense part by matrix stretching (Scott, T., 2019)▸ Hoping to get overall “uniform problem sparsity”
9 / 64
Research directions inside our framework(1)
This talk mentions the following approaches to solve the problem
1 Combining the sparse and dense parts▸ Arbitrary sparse-dense (ASD) approach (Scott., T., 2017)▸ Iterative approach based on CG (CGLS1)▸ If rank(A) > rank(sparse_part_of_A): specific modifications
needed.▸ We rely on an implicit combination of the sparse and dense part of the
approximate inverse coupled together inside CG to get
z =M−1r.
2 Transforming dense to sparse at the expense of getting the problemlarger.
▸ Sparsifying the dense part by matrix stretching (Scott, T., 2019)▸ Hoping to get overall “uniform problem sparsity”▸ Traps on the way: size increase / ill-conditioning
9 / 64
Research directions inside our framework(2)
The approaches (continued)
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
▸ An acceleration is needed also in case of regularization when runningdirect solver.
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
▸ An acceleration is needed also in case of regularization when runningdirect solver.
▸ Can use indefinite or positive definite factorization depending on theproblem.
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
▸ An acceleration is needed also in case of regularization when runningdirect solver.
▸ Can use indefinite or positive definite factorization depending on theproblem.
4 Null-space approach → saddle-point formulation (Scott, T., inpreparation)
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
▸ An acceleration is needed also in case of regularization when runningdirect solver.
▸ Can use indefinite or positive definite factorization depending on theproblem.
4 Null-space approach → saddle-point formulation (Scott, T., inpreparation)
▸ Transformation of the saddle point formulation
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
▸ An acceleration is needed also in case of regularization when runningdirect solver.
▸ Can use indefinite or positive definite factorization depending on theproblem.
4 Null-space approach → saddle-point formulation (Scott, T., inpreparation)
▸ Transformation of the saddle point formulation▸ Computation of null-space bases of wide dense matrices and more
other interesting problems.
10 / 64
Research directions inside our framework(2)
The approaches (continued)
3 Schur complement approach → saddle-point formulation (Scott, T.,2018)
▸ The approach enables to use direct factorization or an incompletefactorization
▸ An acceleration is needed also in case of regularization when runningdirect solver.
▸ Can use indefinite or positive definite factorization depending on theproblem.
4 Null-space approach → saddle-point formulation (Scott, T., inpreparation)
▸ Transformation of the saddle point formulation▸ Computation of null-space bases of wide dense matrices and more
other interesting problems.
All the approaches have specific strengths, weaknesses and a potentialto be further developed.
10 / 64
History
Some historical comments (a few of many - we apologize that only asketch is shown)
11 / 64
History
Some historical comments (a few of many - we apologize that only asketch is shown)
▸ A lot of authors studied direct methods to solve this or a relatedproblem since the 1980s (a lot more interesting papers thanmentioned): George, Heath (1980) (sparse Givens rotations);Björck, Duff (1980) (updates, modifications of Peters-WilkinsonLU-based method, rank-deficiency, weights); Heath (1982) (LSextensions including linear constraints); Björck (1984) (generalupdating scheme with both sparse and dense constraints); Ng(1992) (an interesting scheme to handle rank-deficiency); Sun(1995, 1997) (dense rows in sequential or parallel computingenvironment), and a lot more.
11 / 64
History
Some historical comments (a few of many - we apologize that only asketch is shown)
▸ A lot of authors studied direct methods to solve this or a relatedproblem since the 1980s (a lot more interesting papers thanmentioned): George, Heath (1980) (sparse Givens rotations);Björck, Duff (1980) (updates, modifications of Peters-WilkinsonLU-based method, rank-deficiency, weights); Heath (1982) (LSextensions including linear constraints); Björck (1984) (generalupdating scheme with both sparse and dense constraints); Ng(1992) (an interesting scheme to handle rank-deficiency); Sun(1995, 1997) (dense rows in sequential or parallel computingenvironment), and a lot more.
▸ An authoritative summary of this research in the monographBjörck (1996), see also Björck, 2015.
11 / 64
History
Some historical comments (continued)▸ A lot of work to methods using randomization (Rokhlin, Tygert,
2008; Avron, Maymounkov, Toledo, 2010; Drineas, Mahoney,Muthukrishnan, Sarlós, 2011; Meng, Saunders, Mahoney, 2014)
12 / 64
History
Some historical comments (continued)▸ A lot of work to methods using randomization (Rokhlin, Tygert,
2008; Avron, Maymounkov, Toledo, 2010; Drineas, Mahoney,Muthukrishnan, Sarlós, 2011; Meng, Saunders, Mahoney, 2014)
▸ A lot of work in applications as numerical optimization; see, e.g.,Wright (1992); Lustig et al. (1992); Goldfarb, Shanno (2004);Oliveira, Sorensen (2005), Gondzio (1991), Andersen et al. (1996).
12 / 64
History
Some historical comments (continued)▸ A lot of work to methods using randomization (Rokhlin, Tygert,
2008; Avron, Maymounkov, Toledo, 2010; Drineas, Mahoney,Muthukrishnan, Sarlós, 2011; Meng, Saunders, Mahoney, 2014)
▸ A lot of work in applications as numerical optimization; see, e.g.,Wright (1992); Lustig et al. (1992); Goldfarb, Shanno (2004);Oliveira, Sorensen (2005), Gondzio (1991), Andersen et al. (1996).
▸ Much less work using preconditioned iterative methods:Preconditioners often put on the top of the pioneering work onLSQR (Paige, Saunders, 1982) or LSMR (Fong, Saunders, 2011);Li, Saad, 2006 (MIQR - multilevel IQR), Avron, Ng, Toledo(2009) (low-rank perturbations of A); interesting ideas within therandomized approach (Avron, 2010; Drineas et al. 2011 above;
12 / 64
History
Some historical comments (continued)▸ A lot of work to methods using randomization (Rokhlin, Tygert,
2008; Avron, Maymounkov, Toledo, 2010; Drineas, Mahoney,Muthukrishnan, Sarlós, 2011; Meng, Saunders, Mahoney, 2014)
▸ A lot of work in applications as numerical optimization; see, e.g.,Wright (1992); Lustig et al. (1992); Goldfarb, Shanno (2004);Oliveira, Sorensen (2005), Gondzio (1991), Andersen et al. (1996).
▸ Much less work using preconditioned iterative methods:Preconditioners often put on the top of the pioneering work onLSQR (Paige, Saunders, 1982) or LSMR (Fong, Saunders, 2011);Li, Saad, 2006 (MIQR - multilevel IQR), Avron, Ng, Toledo(2009) (low-rank perturbations of A); interesting ideas within therandomized approach (Avron, 2010; Drineas et al. 2011 above;
▸ New research continues to use ideas from classical papers asPeters, Wilkinson (1970), Sautter (1978), Woodbury (1949,1950).
12 / 64
Notation
Notation for the mixed sparse-dense problem: sparse problem with afew dense rows
A = (As
Ad)
C = (ATs AT
d )(As
Ad) = AT
s As +ATd Ad ≡ Cs +Cd
▸ As ∈ IRms×n is sparse, Ad ∈ IRmd×n is dense, (ms ≫md).▸ Full column rank of A (not necessarily of As)
13 / 64
ASD: Arbitrary sparse-dense preconditioning
Iterative approach based on CG (CGLS1) (Scott, T., 2017): theapproach uses approximations of both
Cs = ATs As and Cd = AT
d Ad
in one preconditioner M .
14 / 64
ASD: Arbitrary sparse-dense preconditioning
Iterative approach based on CG (CGLS1) (Scott, T., 2017): theapproach uses approximations of both
Cs = ATs As and Cd = AT
d Ad
in one preconditioner M .The parts are merged together and applied to the residual as
z =M−1r
14 / 64
ASD: Arbitrary sparse-dense preconditioning
Iterative approach based on CG (CGLS1) (Scott, T., 2017): theapproach uses approximations of both
Cs = ATs As and Cd = AT
d Ad
in one preconditioner M .The parts are merged together and applied to the residual as
z =M−1r
Woodbury formulas (1949, 1950) update by dense and update bysparse behind this merge
14 / 64
ASD: Arbitrary sparse-dense preconditioning
Iterative approach based on CG (CGLS1) (Scott, T., 2017): theapproach uses approximations of both
Cs = ATs As and Cd = AT
d Ad
in one preconditioner M .The parts are merged together and applied to the residual as
z =M−1r
Woodbury formulas (1949, 1950) update by dense and update bysparse behind this mergeExamples of (hidden) Woodbury-like formulas follow
14 / 64
ASD: Arbitrary sparse-dense preconditioning
Iterative approach based on CG (CGLS1) (Scott, T., 2017): theapproach uses approximations of both
Cs = ATs As and Cd = AT
d Ad
in one preconditioner M .The parts are merged together and applied to the residual as
z =M−1r
Woodbury formulas (1949, 1950) update by dense and update bysparse behind this mergeExamples of (hidden) Woodbury-like formulas follow
Theorem
If Cs = LsLTs and ξ1 minimizes ∥AsL−T
s z − bs∥2 exactly, the exact leastsquares solution of our problem can be written as x = L−T
s (ξ1 + Γ1),ρd = bd −AdL−T
s ξ1 and
Γ1 = L−1
s ATd (Imd
+AdL−Ts L−1
s ATd )−1ρd.
14 / 64
ASD: Arbitrary sparse-dense preconditioning
Another example of Woodbury-like formula
15 / 64
ASD: Arbitrary sparse-dense preconditioning
Another example of Woodbury-like formula
Theorem
If Cs = LsLTs and ξ1 is an approximate solution to the problem
minz ∥AsL−Ts z − bs∥
2, the exact least squares solution of the equivalent
problem above can be written as z = ξ1 + Γ1, where ρs = bs −AsL−Ts ξ1,
ρd = bd −AdL−Ts ξ1 and
Γ1 = L−1
s ATs ρs +L−1
s ATd (Imd
+AdL−Ts L−1
s ATd )−1(ρd −AdL−T
s L−1
s ATs ρs).
15 / 64
ASD: Arbitrary sparse-dense preconditioning
Another example of Woodbury-like formula
Theorem
If Cs = LsLTs and ξ1 is an approximate solution to the problem
minz ∥AsL−Ts z − bs∥
2, the exact least squares solution of the equivalent
problem above can be written as z = ξ1 + Γ1, where ρs = bs −AsL−Ts ξ1,
ρd = bd −AdL−Ts ξ1 and
Γ1 = L−1
s ATs ρs +L−1
s ATd (Imd
+AdL−Ts L−1
s ATd )−1(ρd −AdL−T
s L−1
s ATs ρs).
The following examples use only the first one formula that was alwaysreliable.
15 / 64
ASD: Arbitrary sparse-dense preconditioning
SCSD8-2r_a (m=60,550; n=8,650): size of Cs
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
6
number of dense rows
size
of t
he m
atrix
of n
orm
al e
quat
ions
size of the matrix of normal equations
16 / 64
ASD: Arbitrary sparse-dense preconditioning
SCSD8-2r_a (m=60,550; n=8,650): size of Cs
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
6
number of dense rows
size
of t
he m
atrix
of n
orm
al e
quat
ions
size of the matrix of normal equations
This is the determined md ↑.
16 / 64
ASD: Moving rows one by one from As to Ad
SCSD8-2r_a: iteration counts + size_p/size(AT A)
0 10 20 30 40 50 60 70 80 90 1000
20
40
60
80
100
120
number of dense rows
itera
tion
coun
t
iteration count
0 10 20 30 40 50 60 70 80 90 10010
0
101
102
number of dense rowsra
tio p
reco
nditi
oner
siz
e / n
orm
al e
quat
ions
siz
e
ratio preconditioner size / normal equations size
Figure: Problem Meszaros/scsd8− 2r. Iteration counts (left), and ratio of thepreconditioner size to the size of AT A (right) as the number of dense rows thatare removed from A is increased.
17 / 64
ASD: Moving rows one by one from As to Ad
SCSD8-2r_a: timings
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
number of dense rows
time
to c
ompu
te th
e pr
econ
ditio
ner
time to compute the preconditioner
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
number of dense rowstim
e fo
r th
e pr
econ
ditio
ned
itera
tions
time for the preconditioned iterations
Figure: Problem Meszaros/scsd8− 2r. Time to compute the preconditioner(left) and time for CGLS (right) as the number of dense rows that are removedfrom A is increased.
18 / 64
ASD: Moving rows one by one from As to Ad
stormg2_1000 (m=1,377,306; n=528,185): size of Cs
0 50 100 1500
0.5
1
1.5
2
2.5
3
3.5
4
4.5x 10
7
number of dense rows
size
of t
he m
atrix
of n
orm
al e
quat
ions
size of the matrix of normal equations
Figure: Problem Mittelmann/stormg2_1000. Size of ATs As.
19 / 64
ASD: Moving rows one by one from As to Ad
stormg2_1000: large problem: iteration counts + size_p/size(AT A)
0 50 100 1500
200
400
600
800
1000
1200
1400
1600
1800
number of dense rows
itera
tion
coun
t
iteration count
0 50 100 15010
−1
100
101
102
number of dense rowsra
tio p
reco
nditi
oner
siz
e / n
orm
al e
quat
ions
siz
e
ratio preconditioner size / normal equations size
Figure: Problem Mittelmann/stormg2_1000. Iteration counts (left), Ratio ofthe preconditioner size to the size of AT A (right) as the number of dense rowsthat are removed from A is increase.
20 / 64
Moving rows one by one from As to Ad
stormg2_1000: timings
0 50 100 1504
6
8
10
12
14
16
18
20
22
24
number of dense rows
time
to c
ompu
te th
e pr
econ
ditio
ner
time to compute the preconditioner
0 50 100 1500
50
100
150
200
250
300
350
400
number of dense rows
time
for
the
prec
ondi
tione
d ite
ratio
ns
time for the preconditioned iterations
Figure: Problem Mittelmann/stormg2_1000. Time to compute thepreconditioner (left), time for the preconditioned iterations (right).
21 / 64
Experimental evaluation of ASD
Dense rows not exploited Dense rows exploitedIdentifier size_p T _p Its T _i md size_ps T _p Its T _i
lp_fit2p 17,985 0.26 ‡ ‡ 25 4,940 0.09 1 0.01scsd8-2r 51,885 0.25 90 0.11 50 51,855 0.05 7 0.02scagr7-2r 197,067 3,34 244 0.53 7 152,977 0.06 1 0.01scfxm1-2r 227,835 0.59 187 0.51 58 227,823 0.14 33 0.23neos1 789,471 † † † 74 789,471 5.27 132 3.71neos2 † † † † 90 795,323 5.46 157 4.84stormg2-125 395,595 0.27 ‡ ‡ 121 7,978,135 0.22 16 0.29PDE1 † † † † 1 1,623,531 12.7 696 1.28neos † † † † 20 2,874,699 4.93 232 15.0stormg2_1000 3,157,095 19.1 ‡ ‡ 121 3,125,987 19.1 18 2.92cont1_l † † † † 1 11,510,370 4.82 1 0.33
22 / 64
ASD: conclusions
More possibilities to merge formulas.
23 / 64
ASD: conclusions
More possibilities to merge formulas.
It is not clear why some of the formulas work very well and some ofthem can be “destroyed” by slight perturbations of the factorizations- a careful analysis needed
23 / 64
ASD: conclusions
More possibilities to merge formulas.
It is not clear why some of the formulas work very well and some ofthem can be “destroyed” by slight perturbations of the factorizations- a careful analysis needed
As for the experiments, we treated dense as dense intentionallywithout any enhancements. To see how it is qualitatively.
23 / 64
ASD: conclusions
More possibilities to merge formulas.
It is not clear why some of the formulas work very well and some ofthem can be “destroyed” by slight perturbations of the factorizations- a careful analysis needed
As for the experiments, we treated dense as dense intentionallywithout any enhancements. To see how it is qualitatively.
What if As has null columns?
23 / 64
ASD: conclusions
More possibilities to merge formulas.
It is not clear why some of the formulas work very well and some ofthem can be “destroyed” by slight perturbations of the factorizations- a careful analysis needed
As for the experiments, we treated dense as dense intentionallywithout any enhancements. To see how it is qualitatively.
What if As has null columns?
A possible way is to solve the problem with more right-hand sides.
A = (A1 A2) ≡ (As1
Ad1Ad2
) , (1)
⇓
▸ The solution can be expressed as a combination of partial solutions ofminz ∥A1z − b∥
2and minW ∥A1W −A2∥F .
23 / 64
Matrix stretching for the LS problems
Stretching: a specific sparsification by splitting dense rows into sparsepieces.
24 / 64
Matrix stretching for the LS problems
Stretching: a specific sparsification by splitting dense rows into sparsepieces.
The problem is then augmented also by columns.
A
A
24 / 64
Matrix stretching for the LS problems
Stretching: a specific sparsification by splitting dense rows into sparsepieces.
The problem is then augmented also by columns.
A
A
Such strategy called stretching discussed (among others) by Grcar(1990), Vanderbei (1991), Gondzio (1991), Alvarado (1997), Adler(2000), Adler, Björck (2000), Duff, Scott (2005).
24 / 64
Matrix stretching for the LS problems
Stretching: a specific sparsification by splitting dense rows into sparsepieces.
The problem is then augmented also by columns.
A
A
Such strategy called stretching discussed (among others) by Grcar(1990), Vanderbei (1991), Gondzio (1991), Alvarado (1997), Adler(2000), Adler, Björck (2000), Duff, Scott (2005).
Up to now it has not been an approach of choice
24 / 64
Matrix stretching for the LS problems
Derivation: special case of one dense row▸
Ad = (e f) (2)
25 / 64
Matrix stretching for the LS problems
Derivation: special case of one dense row▸
Ad = (e f) (2)
▸ Add one variable s and a modified dense row getting the problemequivalent in the sense of getting the same x as in the original problem
min(xT s)T
RRRRRRRRRRRRRR
RRRRRRRRRRRRRR⎛⎜⎝
Ase Asf 0
e f 0
e −f√
2
⎞⎟⎠⎛⎜⎝
xe
xf
s
⎞⎟⎠ −⎛⎜⎝
bs
bd
0
⎞⎟⎠RRRRRRRRRRRRRR
RRRRRRRRRRRRRR2(3)
25 / 64
Matrix stretching for the LS problems
Derivation: special case of one dense row▸
Ad = (e f) (2)
▸ Add one variable s and a modified dense row getting the problemequivalent in the sense of getting the same x as in the original problem
min(xT s)T
RRRRRRRRRRRRRR
RRRRRRRRRRRRRR⎛⎜⎝
Ase Asf 0
e f 0
e −f√
2
⎞⎟⎠⎛⎜⎝
xe
xf
s
⎞⎟⎠ −⎛⎜⎝
bs
bd
0
⎞⎟⎠RRRRRRRRRRRRRR
RRRRRRRRRRRRRR2(3)
▸ Orthogonally transform the problem applying a Givens rotation G weget
minz∣∣Az − b∣∣2
with
G = 1√2(1 1
1 −1) , A = ⎛⎜⎝
Ase Asf 0√2 e 0 1
0√
2 f −1
⎞⎟⎠ , z = ⎛⎜⎝xe
xf
s
⎞⎟⎠ , b = ⎛⎜⎝bs
bd/√2
bd/√2
⎞⎟⎠ .
25 / 64
Matrix stretching for the LS problems
And this is the sparsified (stretched) matrix
A = ⎛⎜⎝Ase Asf 0√
2 e 0 1
0√
2 f −1
⎞⎟⎠
26 / 64
Matrix stretching for the LS problems
And this is the sparsified (stretched) matrix
A = ⎛⎜⎝Ase Asf 0√
2 e 0 1
0√
2 f −1
⎞⎟⎠The transformation can be used for more rows and more parts
26 / 64
Matrix stretching for the LS problems
And this is the sparsified (stretched) matrix
A = ⎛⎜⎝Ase Asf 0√
2 e 0 1
0√
2 f −1
⎞⎟⎠The transformation can be used for more rows and more parts
But, there are problems with stretching. The first of them: how manyparts? Grcar (1990):“ the main challenge ... lies in determinining theappropriate choice of the number of rows ... to split into ... ”
26 / 64
Matrix stretching for the LS problems
And this is the sparsified (stretched) matrix
A = ⎛⎜⎝Ase Asf 0√
2 e 0 1
0√
2 f −1
⎞⎟⎠The transformation can be used for more rows and more parts
But, there are problems with stretching. The first of them: how manyparts? Grcar (1990):“ the main challenge ... lies in determinining theappropriate choice of the number of rows ... to split into ... ”
We will illustrate this problem by examples.
26 / 64
Matrix stretching for the LS problems
And this is the sparsified (stretched) matrix
A = ⎛⎜⎝Ase Asf 0√
2 e 0 1
0√
2 f −1
⎞⎟⎠The transformation can be used for more rows and more parts
But, there are problems with stretching. The first of them: how manyparts? Grcar (1990):“ the main challenge ... lies in determinining theappropriate choice of the number of rows ... to split into ... ”
We will illustrate this problem by examples.
They show that sparsity can very soon again decrease.
26 / 64
Matrix stretching for the LS problems: very trivial model
0 10 20 30 40 50 60 70
0
10
20
30
40
50
60
70
nz = 1420 10 20 30 40 50 60 70
0
10
20
30
40
50
60
70
nz = 755
Figure: Structure of A (left) and C (right) for the matrix A with a single denserow and As diagonal with n=64 and dense row stretched into 8 parts.
27 / 64
Matrix stretching for the LS problems: very trivial model
0 10 20 30 40 50 60 70
0
10
20
30
40
50
60
70
nz = 1420 10 20 30 40 50 60 70
0
10
20
30
40
50
60
70
nz = 755
Figure: Structure of A (left) and C (right) for the matrix A with a single denserow and As diagonal with n=64 and dense row stretched into 8 parts.
This seems to be very tempting.
27 / 64
Matrix stretching for the LS problems: very simple model
0 10 20 30 40 50 60 7010
2
103
104
number of stretched parts
fill
in th
e st
retc
hed
mat
rices
fill in the stretched matrix of normal equations fill in the stretched matrix
Figure: Number of entries in A and C when A has a single dense row that is splitinto an increasing number of parts and As is diagonal (n = 64).
28 / 64
Matrix stretching for the LS problems: very simple model
0 10 20 30 40 50 60 7010
2
103
104
number of stretched parts
fill
in th
e st
retc
hed
mat
rices
fill in the stretched matrix of normal equations fill in the stretched matrix
Figure: Number of entries in A and C when A has a single dense row that is splitinto an increasing number of parts and As is diagonal (n = 64).
No-fill in the Cholesky factorization
Seems to be very promising28 / 64
Matrix stretching for the LS problems: less trivial model
0 10 20 30 40 50 60
0
10
20
30
40
50
60
nz = 3520 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 382
Figure: Matrices based on discrete 2D Laplacian: unstretched (left), stretched(right).
29 / 64
Matrix stretching for the LS problems: less trivial model
0 10 20 30 40 50 60
0
10
20
30
40
50
60
nz = 3520 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 382
Figure: Matrices based on discrete 2D Laplacian: unstretched (left), stretched(right).
Seems to be promising again
29 / 64
Matrix stretching for the LS problems: generic example
0 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 9910 10 20 30 40 50 60 70 80
0
10
20
30
40
50
60
70
80
nz = 3077
Figure: AT A (left) and its Cholesky factor (right).
Ooops, this is not sparse at all ...
30 / 64
Matrix stretching for the LS problems: fill-in
0 200 400 600 800 1000 120010
4
105
106
107
number of stretched parts
fill
in th
e st
retc
hed
mat
rices
fill in the stretched matrix of normal equations fill in the Cholesky factor of normal equations
Figure: Fill-in in AT A and its Cholesky factor (triangular parts).
31 / 64
Matrix stretching for the LS problems: fill-in
0 200 400 600 800 1000 120010
4
105
106
107
number of stretched parts
fill
in th
e st
retc
hed
mat
rices
fill in the stretched matrix of normal equations fill in the Cholesky factor of normal equations
Figure: Fill-in in AT A and its Cholesky factor (triangular parts).
Reorderings do not help, straightforward stretching not possible!.
31 / 64
Matrix stretching for the LS problems: fill-in
0 200 400 600 800 1000 120010
4
105
106
107
number of stretched parts
fill
in th
e st
retc
hed
mat
rices
fill in the stretched matrix of normal equations fill in the Cholesky factor of normal equations
Figure: Fill-in in AT A and its Cholesky factor (triangular parts).
Reorderings do not help, straightforward stretching not possible!.
More general cases: problems with the fill-in and also withill-conditioning (despite the bounds by Adlers and Björck, 2000).
31 / 64
Matrix stretching for the LS problems: fill-in
0 200 400 600 800 1000 120010
4
105
106
107
number of stretched parts
fill
in th
e st
retc
hed
mat
rices
fill in the stretched matrix of normal equations fill in the Cholesky factor of normal equations
Figure: Fill-in in AT A and its Cholesky factor (triangular parts).
Reorderings do not help, straightforward stretching not possible!.
More general cases: problems with the fill-in and also withill-conditioning (despite the bounds by Adlers and Björck, 2000).
The problem with fill-in is primarily not in the need of additionalmemory, but in contributing to loss of information in preconditioners!
31 / 64
Matrix stretching for the LS problems: fill-in
The normal matrix:
AT A = m
∑k=1
ukT
uk, AT = (u1T , . . . ,um
T ).That is, uk, k = 1, . . . , m are rows of A
32 / 64
Matrix stretching for the LS problems: fill-in
The normal matrix:
AT A = m
∑k=1
ukT
uk, AT = (u1T , . . . ,um
T ).That is, uk, k = 1, . . . , m are rows of A
What if a pattern of a row uj is contained in the pattern of a row ui
(dominated by ui)?
32 / 64
Matrix stretching for the LS problems: fill-in
The normal matrix:
AT A = m
∑k=1
ukT
uk, AT = (u1T , . . . ,um
T ).That is, uk, k = 1, . . . , m are rows of A
What if a pattern of a row uj is contained in the pattern of a row ui
(dominated by ui)?
⇒ Pattern of uj is not needed to get the pattern of AT A.
A =⎛⎜⎜⎜⎜⎜⎜⎝
1 2 3 4 5 6
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮uj ∗ ∗⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ui ∗ ∗ ∗⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
⎞⎟⎟⎟⎟⎟⎟⎠32 / 64
Matrix stretching for the LS problems: fill-in
Let us see this: consider
(As
ui)
The normal matrix
(ATs ui
T )T (As
ui)
has the same sparsity pattern as ATs As.
33 / 64
Matrix stretching for the LS problems: fill-in
Let us see this: consider
(As
ui)
The normal matrix
(ATs ui
T )T (As
ui)
has the same sparsity pattern as ATs As.
ui → F , A → (As
F)
33 / 64
Matrix stretching for the LS problems: fill-in
Let us see this: consider
(As
ui)
The normal matrix
(ATs ui
T )T (As
ui)
has the same sparsity pattern as ATs As.
ui → F , A → (As
F)
The idea: stretch F T into blocks dominated by rows in As!
33 / 64
Matrix stretching for the LS problems: fill-in
Let us see this: consider
(As
ui)
The normal matrix
(ATs ui
T )T (As
ui)
has the same sparsity pattern as ATs As.
ui → F , A → (As
F)
The idea: stretch F T into blocks dominated by rows in As!
AT A = (AT F
0 ST)( A 0
F T S) = (AT A +F T F ST F T
FS ST S)
Struct(AT A + F T F ) ⊆ Struct(AT A)33 / 64
Matrix stretching for the LS problems: sparse stretching
Finding the stretched and dominated pieces
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
∗ ∗∗ ∗ ∗∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗∗ ∗∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠Figure: A matrix of order 15 × 12 with one dense row
34 / 64
Matrix stretching for the LS problems: sparse stretching
rows of A (edges)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
12
10
dense row (nodes)
35 / 64
Matrix stretching for the LS problems: sparse stretching
rows of A (edges)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
12
10
dense row (nodes)
Minimum set cover problem+ ostp cessing
36 / 64
Matrix stretching for the LS problems: sparse stretching
rows of A (edges)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
12
10
dense row (nodes)
First step: Minimum set cover problem37 / 64
Matrix stretching for the LS problems: sparse stretching
rows of A (edges)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
12
10
dense row (nodes)
Second step: find disjoint segments38 / 64
Matrix stretching for the LS problems
Matrix transformed by the stretching fully automaticallyparameter-free using
▸ minimum set cover heuristic▸ sparsification to minimize the total fill-in
39 / 64
Matrix stretching for the LS problems
Matrix transformed by the stretching fully automaticallyparameter-free using
▸ minimum set cover heuristic▸ sparsification to minimize the total fill-in
Based just on combinatorial techniques.
We call this technique sparse stretching
39 / 64
Matrix stretching for the LS problems
Matrix transformed by the stretching fully automaticallyparameter-free using
▸ minimum set cover heuristic▸ sparsification to minimize the total fill-in
Based just on combinatorial techniques.
We call this technique sparse stretching
Much better than the standard (old) stretching in
▸ Sparsity of normal equations.▸ Cholesky factor size▸ Normal equations conditioning.▸ Black-box construction of algebraic construction.▸ Iteration counts
39 / 64
Matrix stretching for the LS problems: old and sparse stretching
Number of entries: AT A and Cholesky factor versus number of parts
Red dot means the result for the sparse stretching
0 50 100 150 200 250
number of stretched parts
5000
6000
7000
8000
9000
10000
11000
entr
ies
in th
e st
retc
hed
norm
al m
atrix
0 50 100 150 200 250
number of stretched parts
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
entr
ies
in th
e C
hole
sky
fact
or
104
Figure: Comparison of the entries in the stretched normal matrix (left) and itsCholesky factor (right) for problem WM1 with one dense row appended.
40 / 64
Matrix stretching for the LS problems: old and sparse stretching
Number of entries: AT A and Cholesky factor versus number of parts
Red dot means the result for the sparse stretching
0 50 100 150 200 250 300 350 400 450 500
number of stretched parts
1
2
3
4
5
6
7
entr
ies
in th
e st
retc
hed
norm
al m
atrix
104
0 50 100 150 200 250 300 350 400 450 500
number of stretched parts
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
entr
ies
in th
e C
hole
sky
fact
or
105
Figure: Comparison of the entries in the stretched normal matrix (left) and itsCholesky factor (right) for problem LP_AGG with one dense row appended.
41 / 64
Matrix stretching for the LS problems: old and sparse stretching
0 100 200 300 400 500
nz = 29268
0
50
100
150
200
250
300
350
400
450
500
0 100 200 300 400 500
nz = 24878
0
50
100
150
200
250
300
350
400
450
500
Figure: For problem LP_AGG with one dense row appended, the sparsity pattern ofthe stretched normal matrix for standard stretching (left) and sparse stretching(right).
42 / 64
Matrix stretching for the LS problems: old and sparse stretching
Figure: For problem LP_AGG with one dense row appended, the sparsity pattern ofL + LT of the Cholesky factor of the stretched normal matrix for standardstretching (left) and sparse stretching (right).
43 / 64
Matrix stretching for the LS problems: old and sparse stretching
The sizes really transfer into the iteration counts
0 50 100 150 200 250 300 350 400 450 500
number of stretched parts
0
10
20
30
40
50
60
70
80
90
100
num
ber
of p
reco
nditi
oned
iter
atio
ns
number of preconditioned iterations
0 50 100 150 200 250 300 350 400 450 500
number of stretched parts
2
2.5
3
3.5
4
4.5
5
prec
ondi
tione
r si
ze
104
preconditioner size
Figure: Comparison of the iteration counts (left) and preconditioner size (right)for the matrix LPAGG. The curve corresponds to the number of entries varyingwith the number of parts into which is the dense row stretched. CGLSpreconditioned by HSL_MI35.
44 / 64
Matrix stretching: old and sparse stretching
Identifier strategy mds nnz(C) nnz(L) nflops
aircraft none 0 1.421×106 7.048×10
6 1.764×1010
standard 17 5.474×104 4.911×10
5 1.759×107
sparse 17 5.474×104 4.911×10
5 1.759×107
sc205-2r none 0 6.510×106 8.002×10
7 1.995×1011
standard 8 1.408×105 1.841×10
6 8.043×107
sparse 8 1.408×105 1.852×10
6 8.150×107
scagr7-2r none 0 2.215×107 9.502×10
7 4.369×1011
standard 7 1.979×106 9.269×10
6 1.423×1010
sparse 7 1.841×105 1.564×10
6 6.538×107
scrs8-2r none 0 6.217×106 1.718×10
7 3.215×1010
standard 22 8.013×105 2.675×10
6 1.244×109
sparse 22 8.968×104 1.258×10
6 7.303×107
scsd8-2r none 0 1.957×106 1.196×10
7 1.960×1010
standard 50 6.287×105 6.924×10
6 5.467×109
sparse 50 1.794×105 4.357×10
6 2.414×109
south31 none 0 1.536×108 1.581×10
8 1.850×1012
standard 381 3.224×105 4.366×10
6 2.546×108
sparse 381 3.224×105 4.260×10
6 2.319×108
HSL_MI35 (Scott, T., 2016) 45 / 64
Matrix stretching for the LS problems: old and sparse stretching
What about conditioning?
0 50 100 150 200 2500
2
4
6
8
10
12x 10
12
number of stretched parts
cond
ition
num
ber
(con
dest
)
0 50 100 150 200 250 300 350 400 450 500
number of stretched parts
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
cond
ition
num
ber
(con
dest
)
1013
Figure: The condition number of the stretched normal matrix for problems WM1
(left) and LP_AGG (right) with one dense row appended.
46 / 64
Matrix stretching for the LS problems: old and sparse stretching
What about conditioning?
0 50 100 150 200 2500
2
4
6
8
10
12x 10
12
number of stretched parts
cond
ition
num
ber
(con
dest
)
0 50 100 150 200 250 300 350 400 450 500
number of stretched parts
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
cond
ition
num
ber
(con
dest
)
1013
Figure: The condition number of the stretched normal matrix for problems WM1
(left) and LP_AGG (right) with one dense row appended.
The condition number goes steadily up in practice46 / 64
Matrix stretching for the LS problems: old and sparse stretching
Ill-conditioning in practice is in agreement with theoretical bounds.
Adlers-Björck theory (see Adlers, Björck, 2000; Scott, T., 2019)
Theorem
An upper bound for the condition number of the stretched matrix (pstretched rows, k parts) A with γ = 1/2√p k∣∣Ad∣∣2 is
κ2(A) ≤ κ2(A)k (1 + 2p k∣∣Ad∣∣22∣∣A∣∣22
)(k + 1 + σn(A)2∣∣Ad∣∣22 ) .
47 / 64
Matrix stretching for the LS problems: old and sparse stretching
Ill-conditioning in practice is in agreement with theoretical bounds.
Adlers-Björck theory (see Adlers, Björck, 2000; Scott, T., 2019)
Theorem
An upper bound for the condition number of the stretched matrix (pstretched rows, k parts) A with γ = 1/2√p k∣∣Ad∣∣2 is
κ2(A) ≤ κ2(A)k (1 + 2p k∣∣Ad∣∣22∣∣A∣∣22
)(k + 1 + σn(A)2∣∣Ad∣∣22 ) .
And this is really not optimistic ...
47 / 64
Matrix stretching for the LS problems: old and sparse stretching
Condition number increase when stretching more rows
0 1 2 3 4 5 6 7 8
number of stretched rows
106
108
1010
1012
1014
1016
cond
ition
num
ber
estim
ate
0 5 10 15 20 25 30 35
number of stretched rows
100
101
102
103
itera
tion
coun
t
Figure: Condition number estimate (right) and iteration count (left) for problemsctap1-2b as the number of dense rows increases.
48 / 64
Matrix stretching for the LS problems: old and sparse stretching
Condition number increase when stretching more rows
0 1 2 3 4 5 6 7 8
number of stretched rows
106
108
1010
1012
1014
1016
cond
ition
num
ber
estim
ate
0 5 10 15 20 25 30 35
number of stretched rows
100
101
102
103
itera
tion
coun
t
Figure: Condition number estimate (right) and iteration count (left) for problemsctap1-2b as the number of dense rows increases.
Do we really need to stretch everything?
48 / 64
Matrix stretching for the LS problems: old and sparse stretching
Remedy: partial stretching▸ Stretching only when needed. For example, when we have null columns
in As
▸ The rest treated by the ASD
Identifier m n md mds nnz(C) nnz(L) nflops
aircraft 7517 3754 17 4 1.575×104 9.984×10
4 2.552×106
sc205-2r 62423 35213 8 1 9.602×104 9.084×10
5 2.875×107
scagr7-2r 46679 32847 7 1 1.443×105 8.531×10
5 2.572×107
scrs8-2r 27691 14364 22 7 6.698×104 6.716×10
5 2.797×107
scsd8-2r 60550 8650 50 5 5.895×104 † †
south31 36321 18425 381 5 1.884×104 2.306×10
4 1.565×105
49 / 64
Matrix stretching for the LS problems: old and sparse stretching
Remedy: partial stretching▸ Stretching only when needed. For example, when we have null columns
in As
▸ The rest treated by the ASD
Identifier m n md mds nnz(C) nnz(L) nflops
aircraft 7517 3754 17 4 1.575×104 9.984×10
4 2.552×106
sc205-2r 62423 35213 8 1 9.602×104 9.084×10
5 2.875×107
scagr7-2r 46679 32847 7 1 1.443×105 8.531×10
5 2.572×107
scrs8-2r 27691 14364 22 7 6.698×104 6.716×10
5 2.797×107
scsd8-2r 60550 8650 50 5 5.895×104 † †
south31 36321 18425 381 5 1.884×104 2.306×10
4 1.565×105
Much better in many cases, but not always.
49 / 64
Matrix stretching for the LS problems: old and sparse stretching
What about a more sophisticated graph preprocessing: weightedvariant of the set cover?
A = (As
Ad) =⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 1 1 0
0 0 0 0 1
1 1 0 0 0
5 6 0 0 0
0 0 7 8 9
3/√2 3/√2 3/√2 3/√2 3/√2
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠. (4)
50 / 64
Matrix stretching for the LS problems: old and sparse stretching
What about a more sophisticated graph preprocessing: weightedvariant of the set cover?
A = (As
Ad) =⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 1 1 0
0 0 0 0 1
1 1 0 0 0
5 6 0 0 0
0 0 7 8 9
3/√2 3/√2 3/√2 3/√2 3/√2
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠. (4)
(As
F T) =
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 1 1 0
0 0 0 0 1
1 1 0 0 0
5 6 0 0 0
0 0 7 8 9
3 3 3 3 0
0 0 0 0 3
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
or
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 1 1 0
0 0 0 0 1
1 1 0 0 0
5 6 0 0 0
0 0 7 8 9
3 3 0 0 0
0 0 3 3 3
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
(5)
50 / 64
Matrix stretching for the LS problems: old and sparse stretching
What about a more sophisticated graph proprocessing: weighted setcover
A = (As
Ad) =⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 1 1 0
0 0 0 0 1
1 1 0 0 0
5 6 0 0 0
0 0 7 8 9
3/√2 3/√2 3/√2 3/√2 3/√2
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠. (6)
ATs As + FF T =
⎛⎜⎜⎜⎜⎜⎜⎝
36 41 10 10 0
41 47 10 10 0
10 10 59 66 63
10 10 66 64 72
0 0 63 72 91
⎞⎟⎟⎟⎟⎟⎟⎠or
⎛⎜⎜⎜⎜⎜⎜⎝
36 41 1 1 0
41 47 1 1 0
1 1 59 66 72
1 1 66 74 81
0 0 72 81 91
⎞⎟⎟⎟⎟⎟⎟⎠.
51 / 64
Matrix stretching for the LS problems: old and sparse stretching
What about a more sophisticated graph proprocessing: weighted setcover
A = (As
Ad) =⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝
1 1 1 1 0
0 0 0 0 1
1 1 0 0 0
5 6 0 0 0
0 0 7 8 9
3/√2 3/√2 3/√2 3/√2 3/√2
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠. (6)
ATs As + FF T =
⎛⎜⎜⎜⎜⎜⎜⎝
36 41 10 10 0
41 47 10 10 0
10 10 59 66 63
10 10 66 64 72
0 0 63 72 91
⎞⎟⎟⎟⎟⎟⎟⎠or
⎛⎜⎜⎜⎜⎜⎜⎝
36 41 1 1 0
41 47 1 1 0
1 1 59 66 72
1 1 66 74 81
0 0 72 81 91
⎞⎟⎟⎟⎟⎟⎟⎠.
Apparently, the second choice is intuitively better.51 / 64
Matrix stretching for the LS problems: old and sparse stretching
0 5 10 15 20 25 30 35
number of appended rows
0
200
400
600
800
1000
1200
1400
1600
1800
itera
tion
coun
t
non-weighted set cover weighted set cover
0 5 10 15 20 25 30 35
number of appended rows
0
100
200
300
400
500
600
itera
tion
coun
t
non-weighted set cover weighted set cover
Figure: Iteration counts for problem lp_agg as the number of appended denserows increases. Results are for sparse stretching using the weighted andnon-weighted vertex set cover with lsize = 50 (left) and lsize = 60 (right).
52 / 64
Matrix stretching for the LS problems: old and sparse stretching
0 20 40 60 80 100 120
lsize
0
500
1000
1500
2000
2500
itera
tion
coun
t
non-weighted set cover weighted set cover
40 50 60 70 80 90 100 110 120
lsize
0
500
1000
1500
2000
2500
3000
itera
tion
coun
t
non-weighted set cover weighted set cover
Figure: Iteration counts for problem pltexpa with one appended dense row as thethe parameter lsize increases. Results are for sparse stretching using theweighted and non-weighted vertex set cover for density ρ = 0.1 (left) and ρ = 0.5
(right).
53 / 64
Matrix stretching for the LS problems: old and sparse stretching
What if the ill-conditioning can be overcome by treating the problemas having the saddle-point structure?
AT A = (AT F T
0 ST )(A 0
F S) = (AT A + F T F AT F
F T A ST S)
See that (in an unscaled case):
ST S =
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
2 −1−1 2 −1−1 2 −1−1 2 −1⋯ ⋯ ⋯−1 2 −1−1 2
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠54 / 64
Matrix stretching for the LS problems: old and sparse stretching
Very preliminary: preconditioners based on
P0 = (Cs +F T F FS
ST F T ST S) , P1 = (diag(Cs) FS
ST F T ST S) .
0 10 20 30 40 50
number of appended rows
101
102
103
itera
tion
coun
t
P0 preconditioner P1 preconditioner
0 10 20 30 40 50
number of appended rows
102
103
104
prec
ondi
tione
r si
ze
P0 preconditioner P1 preconditioner
Figure: Iteration counts (left) and preconditioner size (right) for P 0 and P 1
applied to problem lp_agg as the number appended dense rows increases. Herelsize = 5.
55 / 64
Matrix stretching for the LS problems: old and sparse stretching
Very preliminary: preconditioners based on
P0 = (Cs +F T F FS
ST F T ST S) , P1 = (diag(Cs) FS
ST F T ST S) .
0 10 20 30 40 50
number of appended rows
101
102
103
itera
tion
coun
t
P0 preconditioner P1 preconditioner
0 10 20 30 40 50
number of appended rows
102
103
104
prec
ondi
tione
r si
ze
P0 preconditioner P1 preconditioner
Figure: Iteration counts (left) and preconditioner size (right) for P 0 and P 1
applied to problem lp_agg as the number appended dense rows increases. Herelsize = 5.Many more possibilities to get modified constraint preconditioners.
55 / 64
Schur complement approach
Fully embedded in the Schur complement approach that combines adirect solver, modifications, regularization to get a preconditioner:
System matrix varying α
K(α) = (Cs(α) ATd
Ad −Imd
) .
56 / 64
Schur complement approach
Fully embedded in the Schur complement approach that combines adirect solver, modifications, regularization to get a preconditioner:
System matrix varying α
K(α) = (Cs(α) ATd
Ad −Imd
) .
Once the dense rows are clearly detected, the preconditioned iterativemethod can be extremely successful in solving some hard problems.
56 / 64
Standard null space approach
Optimization motivation of the null-space approach
minimize f(u)subject to B u = g, (7)
57 / 64
Standard null space approach
Optimization motivation of the null-space approach
minimize f(u)subject to B u = g, (7)
The saddle-point problem for the direction vector u, H is the localHessian.
(H BT
B 0)(u
v) = (f −Hu
g) , (8)
57 / 64
Standard null space approach
Optimization motivation of the null-space approach
minimize f(u)subject to B u = g, (7)
The saddle-point problem for the direction vector u, H is the localHessian.
(H BT
B 0)(u
v) = (f −Hu
g) , (8)
Algorithm
Dual variable method for solving the saddle-point problem
1. Find Z with columns forming a basis for N(B)2. Find u such that Bu = g.3. Solve ZT HZz = ZT (f −Hu).4. Set x = u +Zz.5. Find y such that BT y = f −Hu.
57 / 64
Standard null space approach
Dual variable method to solve it heavily relies on the fact that thebottom right block of the saddle-point matrix is zero
58 / 64
Standard null space approach
Dual variable method to solve it heavily relies on the fact that thebottom right block of the saddle-point matrix is zero
Saddle-point from the LS problemThe LS problem can be written as solving the following system
(Cs ATd
Ad −I)( x
Adx) = (c
0) . (9)
and the problem with the nonzero bottom right block should beovercome.
58 / 64
Standard null space approach
Dual variable method to solve it heavily relies on the fact that thebottom right block of the saddle-point matrix is zero
Saddle-point from the LS problemThe LS problem can be written as solving the following system
(Cs ATd
Ad −I)( x
Adx) = (c
0) . (9)
and the problem with the nonzero bottom right block should beovercome.
Denote the problem using more general notation as
A(uv) = (H BT
B −C)(u
v) = (f
g) , (10)
58 / 64
The null space approach to solve sparse/dense LS problems
Theorem
Consider the saddle-point problem above, rank(B) = r ≤ k, C SPD.Assume that E = (Z Y ) ∈ Rn×n is nonsingular, Z ∈ Rn×n−r is such that
BE = (0k,n−r Br), Br ∈ Rk×r, Br ≠ 0. Set E to be
E = [ E 0n,k
0k,n Ik] ∈ R(n+k)×(n+k).
If H is positive definite on the null space of B then A = ETAE issymmetric and invertible, with SPD leading principal submatrix ZT HZ ofthe order n − r. The transformed saddle point system is
A(uv) = ET (f
g) , (u
v) = E (u
v) .
59 / 64
The null space approach to solve sparse/dense LS problems
Remind the LS problem in the saddle-point formulation
(Cs ATd
Ad −I)( x
Adx) = (c
0) . (11)
60 / 64
The null space approach to solve sparse/dense LS problems
Remind the LS problem in the saddle-point formulation
(Cs ATd
Ad −I)( x
Adx) = (c
0) . (11)
Lemma
Consider A = (As
Ad) , As ∈ Rms×n, Ad ∈ Rmd×n. If A is of full rank, then
Cs = ATs As is positive definite on the null space of Ad.
60 / 64
The null space approach to solve sparse/dense LS problems
Remind the LS problem in the saddle-point formulation
(Cs ATd
Ad −I)( x
Adx) = (c
0) . (11)
Lemma
Consider A = (As
Ad) , As ∈ Rms×n, Ad ∈ Rmd×n. If A is of full rank, then
Cs = ATs As is positive definite on the null space of Ad.
It does not matter whether the As itself has a full column rank (abunch specific treatments needed in other solution approaches).
60 / 64
The null space approach to solve sparse/dense LS problems
Remind the LS problem in the saddle-point formulation
(Cs ATd
Ad −I)( x
Adx) = (c
0) . (11)
Lemma
Consider A = (As
Ad) , As ∈ Rms×n, Ad ∈ Rmd×n. If A is of full rank, then
Cs = ATs As is positive definite on the null space of Ad.
It does not matter whether the As itself has a full column rank (abunch specific treatments needed in other solution approaches).
Using a sparse basis for the range space is enough to have awell-defined solver.
60 / 64
The null space approach to solve sparse/dense LS problems
The solution components of the null-space approach:
61 / 64
The null space approach to solve sparse/dense LS problems
The solution components of the null-space approach:
1 Null-space basis computation for B ≡ Ad: the sparse null space basisof a wide dense matrix.
61 / 64
The null space approach to solve sparse/dense LS problems
The solution components of the null-space approach:
1 Null-space basis computation for B ≡ Ad: the sparse null space basisof a wide dense matrix.
2 Preconditioner construction for the transformed system: the Hessianis transformed by a matrix having both null space-basis andrange-space basis E = (Z Y ) ∈ Rn×n.
⎛⎜⎝ZT HZ ZT HY 0n−r,k
Y T HZ Y T HY BTr
0k,n−r Br −C
⎞⎟⎠ .
61 / 64
The null space approach to solve sparse/dense LS problems
Threshold thresh = 0.73
0 100 200 300 400
nz = 5258
0
50
100
150
200
250
300
350
400
450
LPAGG (615 × 488, UFL Sparse Matrix Collection) + 10 dense rows.
62 / 64
Conclusions
We will comment mainly on the space for subsequent research
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
Stretching implies new concepts to find dense rows - based on theways to split and embed their row structures.
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
Stretching implies new concepts to find dense rows - based on theways to split and embed their row structures.
Combination with the structure of factorization?
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
Stretching implies new concepts to find dense rows - based on theways to split and embed their row structures.
Combination with the structure of factorization?
Stretching implies a new approach to solve the problem of nullcolumns of As: stretching makes a bridge between As and Ad andkeeps the As full column rank.
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
Stretching implies new concepts to find dense rows - based on theways to split and embed their row structures.
Combination with the structure of factorization?
Stretching implies a new approach to solve the problem of nullcolumns of As: stretching makes a bridge between As and Ad andkeeps the As full column rank.
Another stretching problem: the stretched rows may be processeddifferently from the rest (?)
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
Stretching implies new concepts to find dense rows - based on theways to split and embed their row structures.
Combination with the structure of factorization?
Stretching implies a new approach to solve the problem of nullcolumns of As: stretching makes a bridge between As and Ad andkeeps the As full column rank.
Another stretching problem: the stretched rows may be processeddifferently from the rest (?)
Null-space approach: a viable strategy as well as a way to get overthe singularity of As (Scott, T., 2019, in preparation).
63 / 64
Conclusions
We will comment mainly on the space for subsequent research
ASD: still a lot theoretical challenges.
Stretching implies new concepts to find dense rows - based on theways to split and embed their row structures.
Combination with the structure of factorization?
Stretching implies a new approach to solve the problem of nullcolumns of As: stretching makes a bridge between As and Ad andkeeps the As full column rank.
Another stretching problem: the stretched rows may be processeddifferently from the rest (?)
Null-space approach: a viable strategy as well as a way to get overthe singularity of As (Scott, T., 2019, in preparation).
Many more questions than expected at the beginning.
63 / 64
Last but not least
Thank you for your attention!
64 / 64