Learning with Structured Sparsity

Post on 12-Jan-2016

31 views 4 download

description

Learning with Structured Sparsity. Authors: Junzhou Huang, Tong Zhang, Dimitris Metaxas. Introduction. Fixed set of p basis vectors where for each j . --> - PowerPoint PPT Presentation

transcript

Zhennan Yan 1

Learning with Structured Sparsity

Authors:Junzhou Huang, Tong Zhang, Dimitris

Metaxas

Zhennan Yan 2

IntroductionFixed set of p basis vectors where

for each j. --> Given a random observation

, which depends on an underlying coefficient vector .

Assume the target coefficient is sparse.Throughout the paper, assume X is fixed, and

randomization is w.r.t. the noise in observation y.

},,{ 1 pxx nj Rx

pnX n

n Ryyy ],,[ 1 pR

Xy

Zhennan Yan 3

IntroductionDefine the support of a vector as

So A natural method for sparse learning is L0

regularization for desired sparsity s:

Here, only consider the least squares loss

pR}0:{)(sup jjp

|)(sup||||| 0 p

,||||)(ˆminargˆ00 stosubjectQL

22||||)(ˆ yXQ

Zhennan Yan 4

IntroductionNP-hard!Standard approach:

Relaxation of L0 to L1 (Lasso)Greedy algorithms (such as OMP)

In practical applications, often know a structure on β in addition to sparsity.Group sparsity: variables in the same group

tend to be zero or nonzeroTonal and transient structures: sparse

decomposition for audio signals

Structured SparsityDenote the index set of coefficientsFor any sparse subset

Coding complexity of F is defined as:

Structured SparsityIf a coefficient vector has a small coding

complexity, it can be efficiently learned.Why ?Number of bits to encode F is cl(F)Number of bits to encode nonzero

coefficients in F is O(|F|)

General Coding SchemeBlock Coding: Consider a small number of

base blocks (each element of is a subset of ), every subset can be expressed as union of blocks in .

Define code length on :

Where

So

General Coding Scheme

a structured greedy algorithm that can take advantage of block structures is efficient:Instead of searching over all subsets of up to

a fixed coding complexity s (exponential), we greedily add blocks from one at a time

is supposed to contain only manageable number of base blocks

General Coding SchemeStandard Sparsity: consisted only of single

element sets and each base block has coding length . This uses bits to code each subset of cardinality k.

Group Sparsity: Graph Sparsity:

General Coding SchemeStandard Sparsity:Group Sparsity: Consider , let

contain the m groups, and contain p single element blocks. Element in has cl0 of ∞, and element in has cl0 of . only looks for signals consisted of the groups.

The result coding length is: if can be represented as union of g disjoint groups.

Graph Sparsity:

General Coding SchemeStandard Sparsity:Group Sparsity:Graph Sparsity: Generalization of Group

Sparsity. Employs a directed graph structure G on . Each element of is a node of G but G may contain additional nodes.

At each node , we define coding length clv(S) on the neighborhood Nv of v, as well as any other single node with clv(u), such that

Zhennan Yan 12

General Coding SchemeExample for Graph Sparsity: Image

Processing ProblemEach pixel has 4 adjacent pixels, the number

of the subsets in its neighborhood is 24 = 16, with a coding length . Encode all other pixels using random jumping with coding length

If connected region F is composed of g sub-regions, then the coding length is

While standard sparse coding length is

Zhennan Yan 13

Algorithms for Structured Sparsity

,||||)(ˆminargˆ00 stosubjectQL

Zhennan Yan 14

Algorithms for Structured SparsityExtend forward greedy algorithms by using

block structure, which is only used to limit the search space.

Zhennan Yan 15

Algorithms for Structured Sparsity

Maximize the gain ratio:

Using least squares regression

Where is the projection matrix to the subspaces generated by columns of XF

Select by

Zhennan Yan 16

Experiments-1D1D structured sparse signal with values +1~-

1, p = 512, k =32g = 2Zero-mean Gaussian noise with standard

deviation is a added to the measurements

n = 4k = 128Recovery result by Lasso, OMP and

structOMP:

Zhennan Yan 17

Experiments-1D

Zhennan Yan 18

Experiments-2DGenerate a 2D structured sparsity image by

putting four letters in random locations.p = H*W = 48*48k = 160g = 4m = 4k = 640

Strongly sparse signal, Lasso is better than OMP!

Zhennan Yan 19

Experiments-2D

Zhennan Yan 20

Experiments for sample size

Zhennan Yan 21

Experiment on Tree-structured Sparsity2D wavelet coefficientWeakly sparse signal

Zhennan Yan 22

Experiments-Background Subtracted Images

Zhennan Yan 23

Experiments for sample size