Bregman Iterative Algorithms for L1 Minimization with Applications to Compressed Sensing W. Yin, S....

Bregman Iterative Algorithms for L1 Minimization with

Applications to Compressed Sensing

W. Yin, S. O., D. Goldfarb, J. Darbon

Problem:

Let nnm RufRA ,R , m

Basis Pursuit: (S. Chen, D. Donoho, M.A. Saunders)

fAuuuu

OPT :minarg1(BP)

m < n (usually m << n)

Basis Pursuit Arises in Compressed Sensing:

(Candes, Romberg, Tao, Donoho, Tanner, Tsaig, Rudelson, Vershynin, Tropp)

Fundamental principle:

Through optimization, the sparsity of a signal can be exploited for signal recovery from incomplete measurements

LetnR u be highly sparse

i.e. nuiuk i 0:0

Principle:

Encode u by mRuAf

nmk

Then recover u from f by solving basis pursuit

fAuuuOPT :minarg1

Proven: [Candes, Tao]

Recovery is perfect, OPTuu whenever k,m,n satisfycertain conditions

Type of matrices A allowing high compression rations (m << n) include

(a) Random matrices with i.i.d. entries

(b) Random ensembles of orthonormal transforms (e.g. matrices formed from random sets of the rows of Fourier transforms)

Huge number of potential applications of compressive sensing

See e.g. Rich Baraniuk’s website:

www.dsp.ece.rice.edu/cs/

1L minimization is widely used for compressive imaging, MRIand CT, multisensor networks and distributive sensing, analog-to-information conversion and biosensing

(BP) can be transformed into a linear program, then solved by conventional methods. Not tailored for A large scale; dense;

Also doesn’t use orthonormality for a Fourier matrix, etc.

One might solve the unconstrained problem

2

1 2

1min fAuu

u(UNC)

norm is 2L

Need to be small to heavily weight the fidelity term.

Also the solution to (UNC) never is that of (BP) unless f = 0

Here: Using Bregman iteration regularization we solve (BP) by a very small number of solutions to (UNC) with different values of f.

Method involves only

(a) Matrix-vector multiplications

(b) Component-wise shrinkages

Method generalizes to the constrained problem

fAuuJu

:)(min

For other convex J

Can solve this through a finite number of Bregman iterations of

2

2

1)(min fAuuJ

u

(again, with a sequence of “f ” values)

Also: we have a two-line algorithm

only involving matrix-vector multiplication and shrinkage operators generating {uk} that converges rapidly to an approximate solution of (BP)

In fact the numerical evidence is overwhelming that it converges to a true solution if is large enough.

Also: Algorithms are robust with respect to noise, both experimentally and with theoretical justification.

BackgroundBackground

To solve (UNC):

Figueiredo, Nowak and WrightKim, Koh, Lustig and Boyd

van den Berg and Friedlander

Shrinkage (soft thresholding) with iteration used by:

Chambolle, DeVore, Lee and Lucier

Figueiredo and Nowak

Daubechies, De Frise and DeMul

Elad, Matalon and Zibulevsky

Hale, Yin and ZhangDarbon and OsherCombettes and Pesquet

The shrinkage people developed an algorithm to solve

)(min1

uHuu

for convex differentiable H(•) and get an iterative scheme:

2

1

1 ))((2

1minarg kkk

ku

k uHuuuu

size step ,0

0; 1,0 0

k

uk

Since u is component-wise separable, we can solve by scalar shrinkage.

Crucial for the speed!

ki

kkkki uHuu ,)(shrink 1

ni ,,1for

where for y, R, define

yy

y

yy

yyy

,

,0

,

)0,max()(sign),(shrink {i.e., make this a semi-implicit method (in numerical analysis terms)

Or replace H(u) by first order Taylor expansion at uk:

kkk uuuHuH ),()(

and force u to be close to uk by the penalty term 2

2/2kuu

}2

1

),(

)(min{arg

2

1

1

kk

kk

kk

uu

uuuH

uHuu

This was adapted for solving

)()(min uHuTVu

and the resulting “linearized” approach was solved by a graph|network based algorithm, very fast.

Darbon and Osher; Wang, Yin and Zhang.

Also: Darbon and Osher did the linearized Bregman approached described here, but for TV deconvolution:

2*

2

1)( fujuH

Bregman Iterative Regularization (Bregman 1967)

Introduced by Osher, Burger, Goldfarb, Xu and Yin in an image processing context.

Extended the Rudin-Osher-Fatemi model

2

2

1||minimize buu

u(ROF):

b a noisy measurement of a clean image and is a tuning parameter.

u

They used the Bregman distance based on

||)()( uuTVuJ

Not a distance really),(),( uvDvuD p

JpJ

(unless J is quadratic)

However ),(),( and 0),( vwDvuDvuD pJ

pJ

pJ for all w on the

line segment connecting u and v.

Instead of solving (ROF) once, our Bregman iterative regularization procedure solves

21

2

1),(minarg buuuDu kp

Jk k

(BROF)

for ,1,0k starting with u0 = 0, p0 = 0 (gives (ROF) for u1)

The p is automatically chosen from optimality

11

11k

so

0)(u

kkk

kk

ubpp

bup

Difference is in the use of regularization.

Bregman iterative regularization regularizes by minimizing the total variation based Bregman distance from u to the previous uk

Earlier results:

(a) converges monotonically to zero

(b) uk gets closer to the unknown unknown noisy image in the sense of Bregman distance diminishes in k at least as long as

bu k

,u),( kp uuD k

bubu k

Numerically, it’s a big improvement.

For all k (BROF), the iterative procedure, can be reduced to ROF with the input

)(1 kkk ubbb

i.e. addadd back the noise.

This is totally general.

Algorithm: Bregman iterative regularization (for J(u), H(u) convex, H differentiable)

Results: The iterative sequence {uk} solves:

(1) Monotonic decrease in H:

)(),()()( 111 kkkpJ

kk uHuuDuHuHk

(2) Convergence to the original in H with exact data:

kuJuHuHuJHu k /)()()( then ,)( and )( minimizes If

(3) Approach towards the original in D with noisy data

Let and suppose );()( fHH 0),( and );( 2 guHfuH

and ,,( ugf represent noisy data, noiseless data, perfect

recovery, and noise level); then ),(),( 11 kpJ

kpJ uuDuuD

kk

as long as 21 ),( fuH k

Motivation: Xu, Osher (2006)

Wavelet based denoising

jjj

u

uuuu

ufu

~ where|,~|

2

1min

1,1

2

1,1

with {j} a wavelet basis.

Then solve

2

}{

~~

2

1|~|min jjj

uufu

j

Decouples: ,~

shrink ~jj fu

(observed (1998) by Chambolle, DeVore, Lee and Lucier)

This is soft thresholding

Interesting: Bregman iterations give

kf

kf

kffk

kff

u

j

jjj

jj

kJ

~ if0

1

~ if )

~(sign

~1

~ ifˆ

~ {i.e. firm thresholding

So for Bregman iterations it takes

j

jf

k ~integer smallest

iterations to recover .~

)(~jj fku

Spikes return in decreasing orders of their magnitudes and sparse data comes back very quickly.

Next: Simple case, such that min

1fuau T

where . and 0 RfRa n

Obvious solution:

runit vecto is , thOPT

jee

a

fu jj

j

aj is component of a with largest magnitude.

ogW .. assume aj = a1 > 0, f > 0 and a1 strictly greater

than all the other a. Then

.11

OPT ea

fu

It is easy to see that the Bregman iterations give an exact solution in

2 )(max

ii

af

steps!

This helps explain our success in the general case.

Convergence results:

Again, the procedure

211

0

00

2

1)( minarg

2,1for

0 ,0

k

u

k

kk

fAuuJu

Aufff

ak

pu

Here 1||)( uuJ

Recent fast method (FPC) of Hale, Yin, Zhang to compute1ku

This is nonlinear Bregman. Converges in a few iterations. However, even faster is linearized Bregman (Darbon-Osher, use for TV deblurring) described below

2 LINE CODE2 LINE CODE

For nonlinear Bregman

Theorem:Suppose an iterate uk satisfies Auk = f. Then uk solves (BP).

Proof:By nonegativity of the Bregman distance, for any u

fAuuuJuJ

fffAuuJ

fAuAuAuuJ

fAuAuuuJ

puuuJuJ

k

k

kk

kTk

kkk

satisfying any for )()(

,)(

,)(

)(,)(

,)()(

Theorem

There exists an integer K < such that any kKu k ,

is a solution of (BP)

Idea: uses the fact that

1)(for ,)( uuJupuJ

Works if we replace by Mkk , for all k.

For dense Gaussian matrices A, we can solve large scale problem instances with more than 8 106 nonzeros in A e.g. n m 4096 2045 in 11 seconds. For partial DCT matrices, much faster

1,000,000 600,000 in 7 minutes

But more like 40 seconds for the linearized Bregman approach!

Also, can’t use minimizer

2

1fAuu

for very small. Takes too long

Need Bregman

Extensions

Finite Convergence

Let )( and )( HJ be convex on H, Hilbert space,

)~( with )( minimizes ~ uJHu

iteratesBregman

)()(min

k

Hu

u

uHuJ

Thm:

Let H(u) = h(Au – f), h convex, differentiable nonnegative, vanishing only at 0. Then Bregman iteration returns a solution of

under very general conditions.

)(

)()(

1

*

*

fAuhAp

fAuhAuHk

j

jk

kk

fAuuJu

|)(min

then

Idea:

k

j

j

kkk

fAuhfAuuJ

puuuJuJ

1

,)(

,)()(

etc.

Strictly convex cases

e.g. regularize,

2

,1 juu

for 0

Then

)(,)( upuuJ

Let2 )( CuJ

Simple to prove.

Theorem:

0 , uuAIf T the fAu k decays exponentially

to zero and solvesuw k

k lim

fAuuJu

|)(min

easy.

Linearized Bregman

Started with Osher-Darbon

fAuuTVmu |)(

let )()( uTVuJ

0

2,1

))((2

1),(min

21

k

fAuAuuuuDu kTkkpJ

k k

Differs from standard Bregman because we replace

2

2fAu by the sum of its first order approximation at uk

and on 2 proximity term at uk.

Then we can use fast methods, either graph cuts for TV or shrinkage for to solve the above!!1

0 ,0

10

00

11

up

fAuAuupp kTkkkk

yields

1

0

1 )(

k

jk

j

Tk uAufAp

Consider (BP). Let 1)( uuJ

,Let 0

jk

j

Tk AufAv

Get a 2 line code:

Linearized Bregman:

),,( ,,Let 1 nn uuuvvv

fAv

niAufAvv

nivu

T

kTkk

ki

ki

0

11

1

,,,1 ,

,,1 ,,shrink

Two LinesTwo Lines

Matrix multiplication and scalar shrinkage.

Theorem:

Let J be strictly convex and C2 and uOPT an optimal solution of Then if uk w we have .|)( fAuuJ

wuwuJwJ OPTOPT ,1

)()(

fAuand k decays exponentially if

definitepositiveisAAT 1

Proof is easy

So for J(u) = |u|1 this would mean that w approaches a minimize of ||u||1 subject to Au = f, as .

Theorem: (don’t need strict convexity and smoothness of J for this)

IAAIf T

then

22121 1fAuuuAAfAu kkkTk

Proof easily follows from Osher, Burger, Goldfarb, Xu, Yin.

(again, don’t need strict convexity and smoothness)

NOISE:NOISE:

Theorem (follows Bachmyer)

andfinitebeuanduJLet ~ )~(

.2 AAI TThen the generalized Bregman distance

2~2

1

,~)()~(),~(~

k

kkkkp

uu

puuuJuJuuDk

diminishes with increasing k, as long as:

fAuAAfuA kT 21~

i.e. as long as the error Auk – f is not too small compared to the error in the “denoised” solution fuA ~

Of course if u~ is the solution of the Basis Pursuit problem,

then this Bregman distance monotonically decreases.

Note, this means for Basis Pursuit

2~1sign ,~|~| kk uuuuu

is diminishing for these values of k.

Here 0 if sign ) sign( ki

kii

k uuu

belongs to [-1,1], determined by the ikki pu sign

iterative procedure.

Date post:	14-Dec-2015
Category:	Documents
Upload:	frederick-pickles
View:	220 times
Download:	6 times

Bregman Iterative Algorithms for L1 Minimization with Applications to Compressed Sensing W. Yin, S....

Documents