Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | frederick-pickles |
View: | 220 times |
Download: | 6 times |
Bregman Iterative Algorithms for L1 Minimization with
Applications to Compressed Sensing
W. Yin, S. O., D. Goldfarb, J. Darbon
Problem:
Let nnm RufRA ,R , m
Basis Pursuit: (S. Chen, D. Donoho, M.A. Saunders)
fAuuuu
OPT :minarg1(BP)
m < n (usually m << n)
Basis Pursuit Arises in Compressed Sensing:
(Candes, Romberg, Tao, Donoho, Tanner, Tsaig, Rudelson, Vershynin, Tropp)
Fundamental principle:
Through optimization, the sparsity of a signal can be exploited for signal recovery from incomplete measurements
LetnR u be highly sparse
i.e. nuiuk i 0:0
Principle:
Encode u by mRuAf
nmk
Then recover u from f by solving basis pursuit
fAuuuOPT :minarg1
Proven: [Candes, Tao]
Recovery is perfect, OPTuu whenever k,m,n satisfycertain conditions
Type of matrices A allowing high compression rations (m << n) include
(a) Random matrices with i.i.d. entries
(b) Random ensembles of orthonormal transforms (e.g. matrices formed from random sets of the rows of Fourier transforms)
Huge number of potential applications of compressive sensing
See e.g. Rich Baraniuk’s website:
www.dsp.ece.rice.edu/cs/
1L minimization is widely used for compressive imaging, MRIand CT, multisensor networks and distributive sensing, analog-to-information conversion and biosensing
(BP) can be transformed into a linear program, then solved by conventional methods. Not tailored for A large scale; dense;
Also doesn’t use orthonormality for a Fourier matrix, etc.
One might solve the unconstrained problem
2
1 2
1min fAuu
u(UNC)
norm is 2L
Need to be small to heavily weight the fidelity term.
Also the solution to (UNC) never is that of (BP) unless f = 0
Here: Using Bregman iteration regularization we solve (BP) by a very small number of solutions to (UNC) with different values of f.
Method involves only
(a) Matrix-vector multiplications
(b) Component-wise shrinkages
Method generalizes to the constrained problem
fAuuJu
:)(min
For other convex J
Can solve this through a finite number of Bregman iterations of
2
2
1)(min fAuuJ
u
(again, with a sequence of “f ” values)
Also: we have a two-line algorithm
only involving matrix-vector multiplication and shrinkage operators generating {uk} that converges rapidly to an approximate solution of (BP)
In fact the numerical evidence is overwhelming that it converges to a true solution if is large enough.
Also: Algorithms are robust with respect to noise, both experimentally and with theoretical justification.
BackgroundBackground
To solve (UNC):
Figueiredo, Nowak and WrightKim, Koh, Lustig and Boyd
van den Berg and Friedlander
Shrinkage (soft thresholding) with iteration used by:
Chambolle, DeVore, Lee and Lucier
Figueiredo and Nowak
Daubechies, De Frise and DeMul
Elad, Matalon and Zibulevsky
Hale, Yin and ZhangDarbon and OsherCombettes and Pesquet
The shrinkage people developed an algorithm to solve
)(min1
uHuu
for convex differentiable H(•) and get an iterative scheme:
2
1
1 ))((2
1minarg kkk
ku
k uHuuuu
size step ,0
0; 1,0 0
k
uk
Since u is component-wise separable, we can solve by scalar shrinkage.
Crucial for the speed!
ki
kkkki uHuu ,)(shrink 1
ni ,,1for
where for y, R, define
yy
y
yy
yyy
,
,0
,
)0,max()(sign),(shrink {i.e., make this a semi-implicit method (in numerical analysis terms)
Or replace H(u) by first order Taylor expansion at uk:
kkk uuuHuH ),()(
and force u to be close to uk by the penalty term 2
2/2kuu
}2
1
),(
)(min{arg
2
1
1
kk
kk
kk
uu
uuuH
uHuu
This was adapted for solving
)()(min uHuTVu
and the resulting “linearized” approach was solved by a graph|network based algorithm, very fast.
Darbon and Osher; Wang, Yin and Zhang.
Also: Darbon and Osher did the linearized Bregman approached described here, but for TV deconvolution:
2*
2
1)( fujuH
Bregman Iterative Regularization (Bregman 1967)
Introduced by Osher, Burger, Goldfarb, Xu and Yin in an image processing context.
Extended the Rudin-Osher-Fatemi model
2
2
1||minimize buu
u(ROF):
b a noisy measurement of a clean image and is a tuning parameter.
u
They used the Bregman distance based on
||)()( uuTVuJ
Not a distance really),(),( uvDvuD p
JpJ
(unless J is quadratic)
However ),(),( and 0),( vwDvuDvuD pJ
pJ
pJ for all w on the
line segment connecting u and v.
Instead of solving (ROF) once, our Bregman iterative regularization procedure solves
21
2
1),(minarg buuuDu kp
Jk k
(BROF)
for ,1,0k starting with u0 = 0, p0 = 0 (gives (ROF) for u1)
The p is automatically chosen from optimality
11
11k
so
0)(u
kkk
kk
ubpp
bup
Difference is in the use of regularization.
Bregman iterative regularization regularizes by minimizing the total variation based Bregman distance from u to the previous uk
Earlier results:
(a) converges monotonically to zero
(b) uk gets closer to the unknown unknown noisy image in the sense of Bregman distance diminishes in k at least as long as
bu k
,u),( kp uuD k
bubu k
Numerically, it’s a big improvement.
For all k (BROF), the iterative procedure, can be reduced to ROF with the input
)(1 kkk ubbb
i.e. addadd back the noise.
This is totally general.
Algorithm: Bregman iterative regularization (for J(u), H(u) convex, H differentiable)
Results: The iterative sequence {uk} solves:
(1) Monotonic decrease in H:
)(),()()( 111 kkkpJ
kk uHuuDuHuHk
(2) Convergence to the original in H with exact data:
kuJuHuHuJHu k /)()()( then ,)( and )( minimizes If
(3) Approach towards the original in D with noisy data
Let and suppose );()( fHH 0),( and );( 2 guHfuH
and ,,( ugf represent noisy data, noiseless data, perfect
recovery, and noise level); then ),(),( 11 kpJ
kpJ uuDuuD
kk
as long as 21 ),( fuH k
Motivation: Xu, Osher (2006)
Wavelet based denoising
jjj
u
uuuu
ufu
~ where|,~|
2
1min
1,1
2
1,1
with {j} a wavelet basis.
Then solve
2
}{
~~
2
1|~|min jjj
uufu
j
Decouples: ,~
shrink ~jj fu
(observed (1998) by Chambolle, DeVore, Lee and Lucier)
This is soft thresholding
Interesting: Bregman iterations give
kf
kf
kffk
kff
u
j
jjj
jj
kJ
~ if0
1
~ if )
~(sign
~1
~ ifˆ
~ {i.e. firm thresholding
So for Bregman iterations it takes
j
jf
k ~integer smallest
iterations to recover .~
)(~jj fku
Spikes return in decreasing orders of their magnitudes and sparse data comes back very quickly.
Next: Simple case, such that min
1fuau T
where . and 0 RfRa n
Obvious solution:
runit vecto is , thOPT
jee
a
fu jj
j
aj is component of a with largest magnitude.
ogW .. assume aj = a1 > 0, f > 0 and a1 strictly greater
than all the other a. Then
.11
OPT ea
fu
It is easy to see that the Bregman iterations give an exact solution in
2 )(max
ii
af
steps!
This helps explain our success in the general case.
Convergence results:
Again, the procedure
211
0
00
2
1)( minarg
2,1for
0 ,0
k
u
k
kk
fAuuJu
Aufff
ak
pu
Here 1||)( uuJ
Recent fast method (FPC) of Hale, Yin, Zhang to compute1ku
This is nonlinear Bregman. Converges in a few iterations. However, even faster is linearized Bregman (Darbon-Osher, use for TV deblurring) described below
2 LINE CODE2 LINE CODE
For nonlinear Bregman
Theorem:Suppose an iterate uk satisfies Auk = f. Then uk solves (BP).
Proof:By nonegativity of the Bregman distance, for any u
fAuuuJuJ
fffAuuJ
fAuAuAuuJ
fAuAuuuJ
puuuJuJ
k
k
kk
kTk
kkk
satisfying any for )()(
,)(
,)(
)(,)(
,)()(
Theorem
There exists an integer K < such that any kKu k ,
is a solution of (BP)
Idea: uses the fact that
1)(for ,)( uuJupuJ
Works if we replace by Mkk , for all k.
For dense Gaussian matrices A, we can solve large scale problem instances with more than 8 106 nonzeros in A e.g. n m 4096 2045 in 11 seconds. For partial DCT matrices, much faster
1,000,000 600,000 in 7 minutes
But more like 40 seconds for the linearized Bregman approach!
Also, can’t use minimizer
2
1fAuu
for very small. Takes too long
Need Bregman
Extensions
Finite Convergence
Let )( and )( HJ be convex on H, Hilbert space,
)~( with )( minimizes ~ uJHu
iteratesBregman
)()(min
k
Hu
u
uHuJ
Thm:
Let H(u) = h(Au – f), h convex, differentiable nonnegative, vanishing only at 0. Then Bregman iteration returns a solution of
under very general conditions.
)(
)()(
1
*
*
fAuhAp
fAuhAuHk
j
jk
kk
fAuuJu
|)(min
then
Idea:
k
j
j
kkk
fAuhfAuuJ
puuuJuJ
1
,)(
,)()(
etc.
Strictly convex cases
e.g. regularize,
2
,1 juu
for 0
Then
)(,)( upuuJ
Let2 )( CuJ
Simple to prove.
Theorem:
0 , uuAIf T the fAu k decays exponentially
to zero and solvesuw k
k lim
fAuuJu
|)(min
easy.
Linearized Bregman
Started with Osher-Darbon
fAuuTVmu |)(
let )()( uTVuJ
0
2,1
))((2
1),(min
21
k
fAuAuuuuDu kTkkpJ
k k
Differs from standard Bregman because we replace
2
2fAu by the sum of its first order approximation at uk
and on 2 proximity term at uk.
Then we can use fast methods, either graph cuts for TV or shrinkage for to solve the above!!1
0 ,0
10
00
11
up
fAuAuupp kTkkkk
yields
1
0
1 )(
k
jk
j
Tk uAufAp
Consider (BP). Let 1)( uuJ
,Let 0
jk
j
Tk AufAv
Get a 2 line code:
Linearized Bregman:
),,( ,,Let 1 nn uuuvvv
fAv
niAufAvv
nivu
T
kTkk
ki
ki
0
11
1
,,,1 ,
,,1 ,,shrink
Two LinesTwo Lines
Matrix multiplication and scalar shrinkage.
Theorem:
Let J be strictly convex and C2 and uOPT an optimal solution of Then if uk w we have .|)( fAuuJ
wuwuJwJ OPTOPT ,1
)()(
fAuand k decays exponentially if
definitepositiveisAAT 1
Proof is easy
So for J(u) = |u|1 this would mean that w approaches a minimize of ||u||1 subject to Au = f, as .
Theorem: (don’t need strict convexity and smoothness of J for this)
IAAIf T
then
22121 1fAuuuAAfAu kkkTk
Proof easily follows from Osher, Burger, Goldfarb, Xu, Yin.
(again, don’t need strict convexity and smoothness)
NOISE:NOISE:
Theorem (follows Bachmyer)
andfinitebeuanduJLet ~ )~(
.2 AAI TThen the generalized Bregman distance
2~2
1
,~)()~(),~(~
k
kkkkp
uu
puuuJuJuuDk
diminishes with increasing k, as long as:
fAuAAfuA kT 21~
i.e. as long as the error Auk – f is not too small compared to the error in the “denoised” solution fuA ~
Of course if u~ is the solution of the Basis Pursuit problem,
then this Bregman distance monotonically decreases.
Note, this means for Basis Pursuit
2~1sign ,~|~| kk uuuuu
is diminishing for these values of k.
Here 0 if sign ) sign( ki
kii
k uuu
belongs to [-1,1], determined by the ikki pu sign
iterative procedure.