+ All Categories
Transcript
Page 1: Exact Recovery of Multichannel Sparse Blind Deconvolution ... · Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent Qing Qu∗, Xiao Li†, Zhihui Zhu⋄

Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient DescentQing Qu∗, Xiao Li†, Zhihui Zhu⋄

∗ Center for Data Science, New York University, † EE Department, Chinese University of Hong Kong, ⋄ MINDS, the Johns Hopkins University

Basic Task

Given multiple yi ∈ Rn of circulant convolution

yi = a ⊛ xi, (1 ≤ i ≤ p),can we recover both unknown kernel a ∈ Rn and

sparse signal xipi=1 ∈ Rn simultaneously?

Our Contribution

With random initializations, a vanilla RGD fol-lowed by a subgradient method converges exactlyto the target solution in a linear rate.

Motivations in Imaging Science

• Computational Microscopy Imaging.

• Geophyiscs and Seismic Imaging.

• Neuroscience. Calcium imaging, Functional MRI.• Image Deblurring.

Symmetric Solutions in MCS-BD

• Scaled & shifts of (a, xi) are solutions to MCS-BD

−1 1

1 −2 0

−1 0 2

=⊛

⊛yi = αsℓ[a] (1/α)s−ℓ[xi]⊛

- W.l.o.g., fix the scaling ∥a∥ = 1 for a.- Hope to recover a up to signed shifts ± sℓ[a0]n−1

ℓ=−n+1.

Assumptions & Problem Formulation

• Assumptions.- Sparse signal xi: xi ∼i.i.d. Bernoulli − Gaussian(θ), θ ∈ (0, 1);- Invertible kernel a with its circulant matrix

Ca︸︷︷︸invertible

= F ∗ diag (a) F , or |a| > 0.

• Problem Formulation. Let us denoteY =

[y1 y2 · · · yp

], X =

[x1 x2 · · · xp

]

- Let h be the inverse kernel of a, h = a⊙−1 or a ⊛ h = 1,Ch · Y = Ch · Ca︸ ︷︷ ︸

= I

· X = X︸︷︷︸sparse

- Ideally, we want to solve the problem

minq

1np

∥CqY ∥0︸ ︷︷ ︸sparsity

= 1np

p∑i=1

∥∥∥Cyiq∥∥∥

0, s.t. q = 0︸ ︷︷ ︸

prevent trivial solution

,

to recover a = sℓ

[αq⊙−1

]up to a shift-scaling symmetry.

• Nonconvex Relaxation. We consider

minq

φ(q) := 1np

p∑i=1

(Cyi

P q)

︸ ︷︷ ︸smooth sparsity function

, s.t. q ∈ Sn−1︸ ︷︷ ︸sphere constraint

.

- Hµ(·) is smooth Huber loss for promoting sparsity.

Hµ(Z) :=n∑

i=1

p∑j=1

hµ(Zij), hµ (z) :=

|z| |z| ≥ µz2

2µ + µ2 |z| < µ

,

- P is a preconditioning matrix.

P = 1

θnp

p∑i=1

C⊤yi

Cyi

−1/2

≈(C⊤

aCa

)−1/2.

- Preconditioning orthogonalizes the kernel Ca

CyiP = Cxi

CaP︸ ︷︷ ︸R

≈ CxiCa

(C⊤

aCa

)−1/2

︸ ︷︷ ︸orthogonal Q

.

Given CyiP q ≈ Cxi

Qq and suppose Q = I , it reduces to

minq

f (q) := 1np

p∑i=1

Hµ (Cxiq) , s.t. q ∈ Sn−1.

This implies that standard basis ±eini=1 are global solutions.

(a) ℓ1-loss, 7 (b) Huber-loss, 7 (c) ℓ4-loss, 7

(d) ℓ1-loss, (e) Huber-loss, (f) ℓ4-loss,

Geometric Property

Study optimization landscape for union of sets

S i±ξ :=

q ∈ Sn−1 | |qi|∥q−i∥∞

≥√

1 + ξ, qi ≷ 0 ,

for some ξ ∈ (0, +∞), where for each set- It contains exactly one solution±ei;

- It excludes all saddle points;- For some small ξ = 1

5 log n,random initialization falls inone S i±

ξ with Prob. ≥ 1/2.

e1e2

−e2

e3

−e3

ξ = 0

ξ = 5 log(n)

• Regularity Condition. With p ≥ Ω (poly(n)), w.h.p.⟨grad f (q), qiq − ei⟩ ≥ α(q) · ∥q − ei∥

holds for each S i+ξ (1 ≤ i ≤ n) with α > 0 and for all

q ∈ S i+ξ ∩

q ∈ Sn−1 |

√1 − q2

i ≥ µ

.

• Implicit Regularization. With p ≥ Ω (poly(n)),w.h.p.⟨

grad f (q), 1qj

ej − 1qi

ei

⟩≥ c

θ(1 − θ)n

ξ

1 + ξ,

for all q ∈ S i+ξ and any qj such that j = i and q2

j ≥ 13q

2i .

From Geometry to Optimization

• Random Initialization. Draw q(0) ∼ U(Sn−1), s.t.

Pq ∈

n⋃i=1

S i±ξ

≥ 1/2

• Phase I: Riemannian Gradient Descent (RGD).q(k+1) = PSn−1

(q(k) − τ · grad f (q(k))

),

with small fixed τ , stays in S i±ξ thanks to implicit

regularization. RGD produces a solution q⋆ with∥∥∥q⋆ − qtgt

∥∥∥ ≤ O(µ)in a linear rate, thanks to regularity condition.

• Phase II: Rounding. With r = q⋆, solve

minq

ζ(q) := 1np

p∑i=1

∥∥∥CyiP q

∥∥∥1 , s.t. ⟨r, q⟩ = 1

via projected subgradient descentq(k+1) = q(k) − τ (k) · Pr⊥g(k),

with τ (k+1) = βτ (k) and β ∈ (0, 1), it converges linearly∥∥∥∥q(k) − qtgt

∥∥∥∥ ≤ ηk, η ∈ (0, 1),thanks to local sharpness of ζ(q).

Comparison with Literature

Experiments

• Algorithmic Convergence and recovery with varying θ

(a) Comparison of iterate convergence (b) Recovery prob. with varying θ

• Phase Transition on (p, n)

(a) ℓ1-loss (b) Huber-loss (c) ℓ4-loss

• Experiments on STORM Imaging

(a) Observation (b) Ground truth (c) Huber-loss (d) ℓ4-loss

(e) Kernel: Ground truth (f) Kernel: Huber-loss (g) Kernel: ℓ4-loss

References

[1] Q. Qu, X. Li, and Z. Zhu “A nonconvex approach for exact and efficient multichannel sparse blinddeconvolution”, NeurIPS, 2019.

[2] Y. Li, and Y. Bresler, “Multichannel sparse blind deconvolution on the sphere”. NeurIPS, 2018.[3] L. Wang, and Y. Chi, “Blind Deconvolution From Multiple Sparse Inputs”. IEEE SPL, 2016.

Top Related