Learning Measurement Matrices for Redundant Dictionaries Richard Baraniuk Rice University Chinmay...

Post on 15-Dec-2015

217 views 0 download

transcript

Learning Measurement Matrices for Redundant Dictionaries

Richard Baraniuk

Rice University

Chinmay HegdeMIT

Aswin Sankaranarayanan

CMU

Sparse Recovery

• Sparsity rocks, etc.

• Previous talk focused mainly on signal inference (ex: classification, NN search)

• This talk focuses on signal recovery

Compressive Sensing

• Sensing via randomized dimensionality reduction

random measurements

sparsesignal

nonzeroentries

• Recovery: solve an ill-posed inverse problem

exploit the geometrical structure of sparse/compressible signals

• Gaussian measurements incoherent with any fixed orthonormal basis (with high probability)

• Ex: frequency domain:

General Sparsifying Bases

Sparse Modeling: Approach 1

• Step 1: Choose a signal model with structure– e.g. bandlimited, smooth with r vanishing moments, etc.

• Step 2: Analytically design a sparsifying basis/frame that exploits this structure– e.g. DCT, wavelets, Gabor, etc.

DCT Wavelets Gabor

? ?

Sparse Modeling: Approach 2

• Learn the sparsifying basis/frame from training data

• Problem formulation: given a large number of training signals, design a dictionary D that simultaneously sparsifies the training data

• Called sparse coding / dictionary learning

Dictionaries

• Dictionary: an NxQ matrix whose columns are used as basis functions for the data

• Convention: assume columns are unit-norm• More columns than rows, so dictionary is

redundant / overcomplete

Dictionary Learning

• Rich vein of theoretical and algorithmic work Olshausen and Field [‘97], Lewicki and Sejnowski [’00], Elad [‘06], Sapiro [‘08]

• Typical formulation: Given training data

Solve:

• Several efficient algorithms, ex: K-SVD

Dictionary Learning

• Successfully applied to denoising, deblurring, inpainting, demosaicking, super-resolution, …– State-of-the-art results in many of these problems

Aharon and Elad ‘06

Dictionary Coherence

• Suppose that the learned dictionary is normalized to have unit -norm columns:

• The mutual coherence of D is defined as

• Geometrically, represents the cosine of the minimum angle between the columns of D, smaller is better

• Crucial parameter in analysis as well as practice (line of work starting with Tropp [04])

Dictionaries and CS• Can extend CS to work with non-orthonormal,

redundant dictionaries

• Coherence of determines recovery success Rauhut et al. [08], Candes et al. [10]

• Fortunately, random guarantees low coherence

Holographic basis

Geometric Intuition

• Columns of D: points on the unit sphere

• Coherence: minimum angle between the vectors

• J-L Lemma: Random projections approximately preserve angles between vectors

Q: Can we do better than random projections for dictionary-based CS?

Q restated: For a given dictionary D, find the best CS measurement matrix

Optimization Approach

• Assume that a good dictionary D has been provided.

• Goal: Learn the best for this particular D

• As before, want the “shortest” matrix such that the coherence of is at most some parameter

• To avoid degeneracies caused by a simple scaling, also want that does not shrink columns much:

A NuMax-like Framework

• Convert quadratic constraints in into linear constraints in (via the “lifting trick”)

• Use a nuclear-norm relaxation of the rank

• Simplified problem:

• Alternating Direction Method of Multipliers (ADMM)

- solve for P using spectral thresholding- solve for L using least-squares

- solve for q using “squishing” Convergence rate depends on the size of the

dictionary (since #constraints = )

Algorithm: “NuMax-Dict”

[HSYB12]

NuMax vs. NuMax-Dict

• Same intuition, trick, algorithm, etc;

• Key enabler is that coherence is intrinsically a quadratic function of the data

• Key difference: the (linearized) constraints are no longer symmetric

– We have constraints of the form

– This might result in intermediate P estimates having complex eigenvalues, so the notion of spectral thresholding needs to be slightly modified

Experimental Results

Expt 1: Synthetic Dictionary

• Generic dictionary: random w/ unit norm. columns• Dictionary size: 64x128• We construct different measurement matrices:

• Random• NuMax-Dict• Algorithm by Elad [06]• Algorithm by Duarte-Carvajalino & Sapiro [08]

• We generate K=3 sparse signals with Gaussian amplitudes, add 30dB measurement noise

• Recovery using OMP• Measure recovery SNR, plot as a function of M

Exp 1: Synthetic Dictionary

Expt 2: Practical Dictionaries• 2x overcomplete DCT dictionary, same parameters• 2x overcomplete dictionary learned on 8x8 patches of a

real-world image (Barbara) using K-SVD• Recovery using OMP

Analysis

• Exact problem seems to be hard to analyze

• But, as in NuMax, can provide analytical bounds in the special case where the measurement matrix is further constrained to be orthonormal

Orthogonal Sensing of Dictionary-Sparse Signals

• Given a dictionary D, find the orthonormal measurement matrix that provides the best possible coherence

• From a geometric perspective, ortho-projections cannot improve coherence, so necessarily

Semidefinite Relaxation

• The usual trick: Lifting and trace-norm relaxation

Theoretical Result

• Theorem: For any given redundant dictionary D, denote its mutual coherence by .

Denote the optimum of the (nonconvex) problem as

Then, there exists a method to produce a rank-2M ortho matrix such that the coherence of is at most

i.e., We can obtain close to optimal performance, but pay a price of a factor 2 in the number of measurements

Conclusions

• NuMax-Dict performance comparable to the best existing algorithms

• Principled convex optimization framework

• Efficient ADMM-type algorithm that exploits the rank-1 structure of the problem

• Upshot: possible to incorporate other structure into the measurement matrix, such as positivity, sparsity, etc.

Open Question

• Above framework assumes a two-step approach: first construct a redundant dictionary (analytically or from data) and then construct a measurement matrix

• Given a large number of training data, how to efficiently solve jointly for both the dictionary and the sensing matrix? (Approach introduced in DC-Sapiro [08])