Support Recovery for Orthogonal Matching Pursuit...s)kvk 2 2 kAvk 2 2 (1 + s)kvk 2 2 8v s.t. kvk 0...

Post on 20-Jan-2021

1 views 0 download

transcript

Support Recovery for Orthogonal Matching PursuitUpper and Lower bounds

Raghav Somani, Chirag Gupta, Prateek Jain and Praneeth Netrapalli

September 24, 2018

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 1 / 18

Sparse Regression

Sparse Regression

x̄ = arg min‖x‖0≤s∗

f(x) (1.1)

x ∈ Rd and s∗ << d.`0 norm counts the number of non-zero elements.

Applications

Resource constrained Machine LearningHigh dimensional StatisticsBioinformatics

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 2 / 18

Sparse Regression

Sparse Regression

x̄ = arg min‖x‖0≤s∗

f(x) (1.1)

x ∈ Rd and s∗ << d.`0 norm counts the number of non-zero elements.

Applications

Resource constrained Machine LearningHigh dimensional StatisticsBioinformatics

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 2 / 18

Sparse Linear Regression (SLR)

Sparse Linear Regression (SLR)

Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find

x̄ = arg min‖x‖0≤s∗

‖Ax− y‖22 (2.1)

where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 3 / 18

Sparse Linear Regression (SLR)

Sparse Linear Regression (SLR)

Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find

x̄ = arg min‖x‖0≤s∗

‖Ax− y‖22 (2.1)

where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 3 / 18

Sparse Linear Regression (SLR)

Sparse Linear Regression (SLR)

Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find

x̄ = arg min‖x‖0≤s∗

‖Ax− y‖22 (2.1)

where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 3 / 18

Sparse Linear Regression (SLR)

Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18

Sparse Linear Regression (SLR)

Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18

Sparse Linear Regression (SLR)

Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18

Sparse Linear Regression (SLR)

Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18

Sparse Linear Regression (SLR)

Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18

Sparse Linear Regression (SLR)

Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18

Sparse Linear Regression (SLR)

Goals of SLR

SLR can be modelled asy = Ax̄ + η (2.2)

where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.

=⇒ y = AS∗ x̄S∗ + η (2.3)

Model with deterministic conditions on η can also be analyzed.

Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1

n ‖A(x− x̄)‖22where the rows of A are i.i.d.

2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18

Sparse Linear Regression (SLR)

Goals of SLR

SLR can be modelled asy = Ax̄ + η (2.2)

where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.

=⇒ y = AS∗ x̄S∗ + η (2.3)

Model with deterministic conditions on η can also be analyzed.

Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1

n ‖A(x− x̄)‖22where the rows of A are i.i.d.

2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18

Sparse Linear Regression (SLR)

Goals of SLR

SLR can be modelled asy = Ax̄ + η (2.2)

where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.

=⇒ y = AS∗ x̄S∗ + η (2.3)

Model with deterministic conditions on η can also be analyzed.

Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1

n ‖A(x− x̄)‖22where the rows of A are i.i.d.

2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18

Sparse Linear Regression (SLR)

Goals of SLR

SLR can be modelled asy = Ax̄ + η (2.2)

where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.

=⇒ y = AS∗ x̄S∗ + η (2.3)

Model with deterministic conditions on η can also be analyzed.

Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1

n ‖A(x− x̄)‖22where the rows of A are i.i.d.

2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18

Sparse Linear Regression (SLR)

Algorithms to solve SLR

The literature mainly studies 3 classes of algorithms

Existing SLR algorithms

`1 minimization based (LASSO based). E.g. - Dantzig selectorNon-convex penalty based. E.g. - IHT, SCAD penalty, Log-sum penaltyGreedy methods. E.g. - Orthogonal Matching Pursuit (OMP)

We study SLR under RSC assumption for OMP algorithm.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 6 / 18

Sparse Linear Regression (SLR)

Algorithms to solve SLR

The literature mainly studies 3 classes of algorithms

Existing SLR algorithms

`1 minimization based (LASSO based). E.g. - Dantzig selectorNon-convex penalty based. E.g. - IHT, SCAD penalty, Log-sum penaltyGreedy methods. E.g. - Orthogonal Matching Pursuit (OMP)

We study SLR under RSC assumption for OMP algorithm.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 6 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Result: OMP sparse estimate x̂OMPs = xs

S0 = φ, x0 = 0, r0 = yfor k = 1, 2, . . . , s do

j ← arg maxi 6∈Sk−1

|ATi rk−1| (Greedy selection)

Sk ← Sk−1 ∪ {j}xk ← arg min

supp(x)⊆Sk

‖Ax− y‖22rk ← y −Axk

endAlgorithm 1: Orthogonal Matching Pursuit (OMP) for SLR

Note that ATi rk−1 ∝ [∇f(xk−1)]i for f(x) = ‖Ax− y‖22

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 8 / 18

Sparse Linear Regression (SLR)

Orthogonal Matching Pursuit for SLR

Result: OMP sparse estimate x̂OMPs = xs

S0 = φ, x0 = 0, r0 = yfor k = 1, 2, . . . , s do

j ← arg maxi 6∈Sk−1

|ATi rk−1| (Greedy selection)

Sk ← Sk−1 ∪ {j}xk ← arg min

supp(x)⊆Sk

‖Ax− y‖22rk ← y −Axk

endAlgorithm 1: Orthogonal Matching Pursuit (OMP) for SLR

Note that ATi rk−1 ∝ [∇f(xk−1)]i for f(x) = ‖Ax− y‖22

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 8 / 18

Orthogonal Matching Pursuit

Orthogonal Matching Pursuit for general f(x)

Result: OMP sparse estimate x̂OMPs = xs

S0 = φ, x0 = 0for k = 1, 2, . . . , s do

j := arg maxi 6∈Sk−1

|[∇f(xk−1)]i|

Sk := Sk−1 ∪ {j}xk := arg min

supp(x)⊆Sk

f(x)

endAlgorithm 2: OMP for a general function f(x)

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 9 / 18

Sparse Linear Regression (SLR) for OMP

Key quantities

Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)

ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)

∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)

κ̃s =ρ+1ρ−s

(4.2)

We also define

κs =ρ+sρ−s≥ κ̃s (4.3)

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 10 / 18

Sparse Linear Regression (SLR) for OMP

Key quantities

Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)

ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)

∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)

κ̃s =ρ+1ρ−s

(4.2)

We also define

κs =ρ+sρ−s≥ κ̃s (4.3)

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 10 / 18

Sparse Linear Regression (SLR) for OMP

Key quantities

Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)

ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)

∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)

κ̃s =ρ+1ρ−s

(4.2)

We also define

κs =ρ+sρ−s≥ κ̃s (4.3)

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 10 / 18

Sparse Linear Regression (SLR) for OMP

Lower bounds for Fast rates

If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow

sup‖x̄‖0≤s∗

1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)

Non-tractable since computing x̂`0 involves searching all(ds∗

)subsets.

(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies

sup‖x̄‖0≤s∗

1

nE[∥∥A(x̂poly − x̄)

∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)

Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.

. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18

Sparse Linear Regression (SLR) for OMP

Lower bounds for Fast rates

If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow

sup‖x̄‖0≤s∗

1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)

Non-tractable since computing x̂`0 involves searching all(ds∗

)subsets.

(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies

sup‖x̄‖0≤s∗

1

nE[∥∥A(x̂poly − x̄)

∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)

Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.

. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18

Sparse Linear Regression (SLR) for OMP

Lower bounds for Fast rates

If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow

sup‖x̄‖0≤s∗

1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)

Non-tractable since computing x̂`0 involves searching all(ds∗

)subsets.

(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies

sup‖x̄‖0≤s∗

1

nE[∥∥A(x̂poly − x̄)

∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)

Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.

. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18

Sparse Linear Regression (SLR) for OMP

Lower bounds for Fast rates

If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow

sup‖x̄‖0≤s∗

1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)

Non-tractable since computing x̂`0 involves searching all(ds∗

)subsets.

(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies

sup‖x̄‖0≤s∗

1

nE[∥∥A(x̂poly − x̄)

∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)

Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.

. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Upper bounds on Generalization error

Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗

iterations, then with high probability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)

With slight modification to T. Zhang’s analysis we get

Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)

Matches the fast rate lower bound up to log factors.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Upper bounds on Generalization error

Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗

iterations, then with high probability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)

With slight modification to T. Zhang’s analysis we get

Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)

Matches the fast rate lower bound up to log factors.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Upper bounds on Generalization error

Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗

iterations, then with high probability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)

With slight modification to T. Zhang’s analysis we get

Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)

Matches the fast rate lower bound up to log factors.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Upper bounds on Generalization error

Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗

iterations, then with high probability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)

With slight modification to T. Zhang’s analysis we get

Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)

Matches the fast rate lower bound up to log factors.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.

Large decrease in objective

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ

and |x̄min| &σγ√ρ+1

ρ−s+s∗

then with high probability

‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.

Large decrease in objective

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ

and |x̄min| &σγ√ρ+1

ρ−s+s∗

then with high probability

‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.

Large decrease in objective

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ

and |x̄min| &σγ√ρ+1

ρ−s+s∗

then with high probability

‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.

Large decrease in objective

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ

and |x̄min| &σγ√ρ+1

ρ−s+s∗

then with high probability

‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.

Large decrease in objective

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ

and |x̄min| &σγ√ρ+1

ρ−s+s∗

then with high probability

‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.

Support recovery and infinity norm bound

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.

|x̄min| &σγ√ρ+1

ρ−s+s∗

, then with high probability

1 S∗ ⊆ supp(x̂s)

2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

Condition on |x̄min| scales as 1√n

since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least

√κ̃ than that in other recent works.

γ is allowed to be very large.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.

Support recovery and infinity norm bound

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.

|x̄min| &σγ√ρ+1

ρ−s+s∗

, then with high probability

1 S∗ ⊆ supp(x̂s)

2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

Condition on |x̄min| scales as 1√n

since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least

√κ̃ than that in other recent works.

γ is allowed to be very large.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.

Support recovery and infinity norm bound

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.

|x̄min| &σγ√ρ+1

ρ−s+s∗

, then with high probability

1 S∗ ⊆ supp(x̂s)

2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

Condition on |x̄min| scales as 1√n

since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least

√κ̃ than that in other recent works.

γ is allowed to be very large.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18

Sparse Linear Regression (SLR) for OMP Upper bounds

Support Recovery upper bound

Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.

Support recovery and infinity norm bound

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.

|x̄min| &σγ√ρ+1

ρ−s+s∗

, then with high probability

1 S∗ ⊆ supp(x̂s)

2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

Condition on |x̄min| scales as 1√n

since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least

√κ̃ than that in other recent works.

γ is allowed to be very large.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗

iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄

x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}

Construct M(ε) ∈ Rn×d parameterized by ε

M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.

∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j

]+√εgi ∀ i 6∈ [s∗] where gi’s are

orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗

iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄

x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}

Construct M(ε) ∈ Rn×d parameterized by ε

M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.

∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j

]+√εgi ∀ i 6∈ [s∗] where gi’s are

orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗

iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄

x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}

Construct M(ε) ∈ Rn×d parameterized by ε

M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.

∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j

]+√εgi ∀ i 6∈ [s∗] where gi’s are

orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗

iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄

x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}

Construct M(ε) ∈ Rn×d parameterized by ε

M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.

∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j

]+√εgi ∀ i 6∈ [s∗] where gi’s are

orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗

iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄

x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}

Construct M(ε) ∈ Rn×d parameterized by ε

M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.

∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j

]+√εgi ∀ i 6∈ [s∗] where gi’s are

orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bounds

Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations

κ̃s(M(ε)) . s

s∗ and γ ≤√

23

S∗ ∩ supp(x̂s) = φ

Noisy case

For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N

(0, σ2In×n

), then

κ̃s(M(ε)) . s

s∗ and γ ≤ 12

with high probability 1n ‖Ax̂s −Ax̄‖22 & 1

nσ2κ̃s+s∗s

with high probability S∗ ∩ supp(x̂s) = φ

=⇒ s & κ̃ss∗ iterations are indeed necessary.

Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bounds

Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations

κ̃s(M(ε)) . s

s∗ and γ ≤√

23

S∗ ∩ supp(x̂s) = φ

Noisy case

For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N

(0, σ2In×n

), then

κ̃s(M(ε)) . s

s∗ and γ ≤ 12

with high probability 1n ‖Ax̂s −Ax̄‖22 & 1

nσ2κ̃s+s∗s

with high probability S∗ ∩ supp(x̂s) = φ

=⇒ s & κ̃ss∗ iterations are indeed necessary.

Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18

Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bounds

Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations

κ̃s(M(ε)) . s

s∗ and γ ≤√

23

S∗ ∩ supp(x̂s) = φ

Noisy case

For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N

(0, σ2In×n

), then

κ̃s(M(ε)) . s

s∗ and γ ≤ 12

with high probability 1n ‖Ax̂s −Ax̄‖22 & 1

nσ2κ̃s+s∗s

with high probability S∗ ∩ supp(x̂s) = φ

=⇒ s & κ̃ss∗ iterations are indeed necessary.

Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18

Sparse Linear Regression (SLR) for OMP Simulations

Simulations

We perform simulations on the lower bound instance class.M(ε) ∈ R1000×100 and s∗ = 10.

(a) Varying condition number (b) Varying noise variance

Figure: Number of iterations required for recovering the full support of x̄ with respect tothe restricted condition number (κ̃s+s∗ ) of design matrix and the variance of noise (σ2).

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 18 / 18