Ivan Selesnick March 23, 2017 - New York University

Linear Inverse Problems, Sparse Regularization,and Convex Optimization

Ivan Selesnick

March 23, 2017

1 / 59

Under-determined linear equations

Consider a system of under-determined system of equations

y = Ax (1)

A : M × N (M < N)

y : M × 1

x : N × 1

y =

y(0)...

y(M − 1)

x =

x(0)

...

x(N − 1)

The system has more unknowns than equations.

We assume AAH is invertible, therefore (1) has infinitely many solutions.

2 / 59

Norms

We will use the `2 and `1 norms.

‖x‖22 :=

∑

n

|x(n)|2 (2)

‖x‖1 :=∑

n

|x(n)| (3)

‖x‖22, i.e., the sum of squares, is referred to as the ‘energy’ of x.

3 / 59

Least squares

To solve y = Ax, it is common to minimize the energy of x.

x = arg minx‖x‖2

2 (4a)

such that y = Ax. (4b)

The solution isx = AH(AAH)−1

y. (5)

When y is noisy, don’t solve y = Ax exactly. Instead, find approximate solution:

x = arg minx

{‖y − Ax‖2

2 + λ‖x‖22

}(6)

The solution isx =

(AHA + λI

)−1AH y. (7)

Large scale systems −→ fast algorithms needed..

4 / 59

Sparse solutions

Another approach to solve y = Ax,

x = arg minx‖x‖1 (8a)

such that y = Ax (8b)

Problem (8) is the basis pursuit (BP) problem.

When y is noisy, don’t solve y = Ax exactly. Instead, find approximate solution.

x = arg minx

{‖y − Ax‖2

2 + λ‖x‖1

}(9)

Problem (9) is the basis pursuit denoising (BPD) problem.

The BP/BPD problems can not be solved in explicit form, only by iterativenumerical algorithms.

5 / 59

Least squares & BP/BPD

Least squares and BP/BPD solutions are quite different. Why?

To minimize ‖x‖22 . . . , the largest values of x must be made small as they

count much more than the smallest values.⇒ least square solutions have many small values, as they are relativelyunimportant ⇒ least square solutions are not sparse.

x

x2

|x|

1 2�1�2 0

t

f(t)

�(|t| + ✏)p

a + b |t|

t0

x

f(x)

f(x) = x

f(x) = sin x

f(x) = 120ex

1

Therefore, when it is known/expected that x is sparse, use the `1 norm;not the `2 norm.

6 / 59

Algorithms for sparse solutions

Objective function

I Non-differentiable

I Convex

I Large-scale

Algorithms

I MM: Majorization-Minimization

I ISTA: Iterative Shrinkage/Thresholding Algorithm

I FISTA: Fast ISTA

I SALSA (ADMM): Split Augmented Lagrangian Shrinkage Algorithm

I ‘Matrix-free’ algorithms

and more...

7 / 59

Parseval frames

The columns of A form a Parseval frame if AAH = pI with p > 0.

A

AH

= pI

If AAH = pI then the solution to

x = arg minx‖x‖2

2

such that y = Ax

is

x = AH(AAH)−1 y (10)

=1

pAH y (11)

No matrix inversion needed.

8 / 59

Parseval frames

If

AAH = pI (12)

then the solution to

x = arg minx

{‖y − Ax‖2

2 + λ‖x‖22

}(13)

is

x = (AHA + λI)−1AH y (14)

=1

λ+ pAH y (15)

using the matrix inverse lemma,

(λ I + AHA

)−1

=1

λI− 1

λAH(λ I + AAH

)−1

A. (16)

• So, if AAH = pI then finding least square solutions is easy. No matrix

inversion needed.

Some algorithms for BP/BPD also become computationally easier.

9 / 59

Example: Sparse Fourier coefficients using BP

The Fourier transform tells how to write the signal as a sum of sinusoids. But,it is not the only way.

Basis pursuit gives a sparse spectrum.

Suppose the M-point signal y is written as

y(m) =N−1∑

n=0

c(n) exp

(j2π

Nmn

), 0 6 m 6 M − 1 (17)

where c(n) is a length-N coefficient sequence, with M 6 N.

y = Ac (18)

Am,n = exp

(j2π

Nmn

), 0 6 m 6 M − 1, 0 6 n 6 N − 1 (19)

c : length-N

The coefficients c(n) are frequency-domain (Fourier) coefficients.

10 / 59


1. If N = M, then A is the inverse N-point DFT matrix.

2. If N > M, then A is the first M rows of the inverse N-point DFT matrix.⇒ A or AH can be implemented efficiently using the FFT.For example, in Matlab, y = Ac is implemented as:

function y = A(c, M, N)

v = N * ifft(c);

y = v(1:M);

end

Similarly, AH y can be obtained by zero-padding and computing the DFT.In Matlab, c = AH y is implemented as:

function c = AT(y, M, N)

c = fft([y; zeros(N-M, 1)]);

end

⇒ Matrix-free algorithms.

3. Due to the orthogonality properties of complex sinusoids,

AAH = N IM (20)

11 / 59


When N = M, the coefficients c satisfying y = Ac are uniquely determined.

When N > M, the coefficients c are not unique. Any vector c satisfyingy = Ac can be considered a valid set of coefficients. To find a particularsolution we can minimize either ‖c‖2

2 or ‖c‖1.

Least squares:

c = arg minc‖c‖2

2 (21a)

such that y = Ac (21b)

Basis pursuit:

c = arg minc‖c‖1 (22a)

such that y = Ac. (22b)

The two solutions can be quite different...

12 / 59


0 20 40 60 80 100

−1

−0.5

0

0.5

1

1.5

Time (samples)

Signal

Real part

Imaginary part

Least square solution:

c = AH(AAH)−1 y

=1

NAHy (AAH = N I)

which is computed by

1. zero-pad the length-M signal y to length-N

2. compute its DFT

BP solution: Compute using algorithm SALSA

13 / 59


0 20 40 60 80 1000

20

40

60

80

Frequency (DFT index)

(A) Fourier coefficients (DFT)

0 50 100 150 200 2500

0.1

0.2

0.3

0.4(B) Fourier coefficients (least square solution)

Frequency (index)

0 50 100 150 200 2500

0.2

0.4

0.6

0.8

1(C) Fourier coefficients (basis pursuit solution)

Frequency (index)

The BP solution does not exhibit the leakage phenomenon.

14 / 59


0 20 40 60 80 100

1

1.5

2

2.5

Cost function history

Iteration

Cost function history of algorithm for basis pursuit solution

15 / 59

Example: Denoising using BPD

Digital LTI filters are often used for noise reduction (denoising).

But, if

I the noise and signal overlap in the frequency domain,or

I the respective frequency bands are unknown,

then it is difficult to use LTI filters.

However, if the signal has sparse (or relatively sparse) Fourier coefficients, thenBPD can be used for noise reduction.

16 / 59


Noisy speech signal y

y(m) = s(m) + w(m), 0 6 m 6 M − 1, M = 500 (23)

s : noise-free speech signalw : noise sequence.

0 100 200 300 400 500

−0.4

−0.2

0

0.2

0.4

0.6 Noisy signal

Time (samples)

0 100 200 300 400 5000

0.01

0.02

0.03

0.04(A) Fourier coefficients (FFT) of noisy signal

Frequency (index)

17 / 59


Assume the noise-free speech signal s has a sparse set of Fourier coefficients:

y = Ac + w

y : noisy speech signal, length MA : M × N DFT matrix (19)c : sparse Fourier coefficients, length Nw : noise, length M

As y is noisy, find c by solving the least square problem

c = arg minc

{‖y − Ac‖2

2 + λ‖c‖22

}(24)

or the basis pursuit denoising (BPD) problem

c = arg minc

{‖y − Ac‖2

2 + λ‖c‖1

}. (25)

Once c is found, an estimate of the speech signal is given by s = Ac.

18 / 59


Least square solution:

c = (AHA + λI)−1AH y (26)

=1

λ+ NAH y (AAH = N I) (27)

using matrix inverse lemma.

⇒ least square estimate of the speech signal is

s = Ac

=N

λ+ Ny (least square solution).

But s is only a scaled version of the noisy signal!

No filtering is achieved.

19 / 59


BPD solution

0 100 200 300 400 5000

0.02

0.04

0.06(B) Fourier coefficients (BPD solution)

Frequency (index)

0 100 200 300 400 500

−0.4

−0.2

0

0.2

0.4

0.6 Denoising using BPD

Time (samples)

Obtained with algorithm SALSA. Effective noise reduction, unlike least squares!

20 / 59

Example: Deconvolution using BPD

If the signal of interest x is not only noisy but is also distorted by an LTIsystem with impulse response h, then the available data y is

y(m) = (h ∗ x)(m) + w(m) (28)

where ‘∗’ denotes convolution (linear convolution) and w is additive noise.Given the observed data y , we aim to estimate the signal x . We will assumethat the sequence h is known.

21 / 59


y = Hx + w (29)

x : length Nh : length Ly : length M = N + L− 1

H =

h0

h1 h0

h2 h1 h0

h2 h1 h0

h2 h1

h2

(30)

H is of size M × N with M > N (because M = N + L− 1).

22 / 59


Sparse signal convolved by the 4-point moving average filter

h(n) =

{14

n = 0, 1, 2, 3

0 otherwise

0 20 40 60 80 100

−1

0

1

2 Sparse signal

0 20 40 60 80 100−0.4

−0.2

0

0.2

0.4

0.6Observed signal

23 / 59


Due to noise, solve the regularized least square problem

x = arg minx

{‖y −Hx‖2

2 + λ‖x‖22

}(31)

or the basis pursuit denoising (BPD) problem

x = arg minx

{‖y −Hx‖2

2 + λ‖x‖1

}. (32)

Least square solution:x = (HHH + λI)−1HH y. (33)

24 / 59


0 20 40 60 80 100−0.4

−0.2

0

0.2

0.4

0.6Deconvolution (least square solution)

0 20 40 60 80 100

−1

0

1

2 Deconvolution (BPD solution)

The BPD solution, obtained using SALSA, is more faithful to original signal.

25 / 59

Example: Filling in missing samples using BP

Due to data transmission/acquisition errors, some signal samples may be lost.Fill in missing values for error concealment.

Part of a signal or image may be intentionally deleted (image editing, etc).Convincingly fill in missing values according to the surrounding area to doinpainting.

0 100 200 300 400 500

−0.5

0

0.5 Incomplete signal

Time (samples)

200 missing samples

26 / 59


We write the incomplete data y as

y = Sx (34)

x : length My : length K < MS : ‘selection’ (or ‘sampling’) matrix of size K ×M.

For example, if only the first, second and last elements of a 5-point signal x areobserved, then the matrix S is given by:

S =

1 0 0 0 00 1 0 0 00 0 0 0 1

. (35)

Problem: Given y and S, find x such that y = Sx

⇒ Underdetermined system, infinitely many solutions.

Least square and BP solutions are very different...

27 / 59


Properties of S

1.SST = I (36)

where I is an K × K identity matrix. For example, with S in (35)

SST =

1 0 00 1 00 0 1

.

2. STy sets the missing samples to zero.For example, with S in (35)

STy =

1 0 00 1 00 0 00 0 00 0 1

y(0)y(1)y(2)

=

y(0)y(1)

00

y(2)

. (37)

28 / 59


Suppose x has a sparse representation with respect to A,

x = Ac (38)

c : sparse vector, length N, with M 6 NA : size M × N.

The incomplete data y can then be written as

y = Sx (39a)

= SAc. (39b)

Therefore, if c satisfiesy = SAc (40)

then we may estimate x asx = Ac. (41)

Note that y is shorter than the coefficient vector c, so (40) has infinitely manysolutions.

29 / 59


Any vector c satisfying y = SAc can be considered a valid set of coefficients.

To find a particular solution, solve the least squares (LS) problem

c = arg minc‖c‖2

2 (42a)

such that y = SAc (42b)

or the basis pursuit (BP) problem

c = arg minc‖c‖1 (43a)

such that y = SAc. (43b)

We will see . . . the LS and BP solutions are very different.

Let us assume A satisfiesAAH = pI, (44)

for some positive real number p.

30 / 59


The least square solution is

c = (SA)H((SA)(SA)H)−1 y (45)

= AHST(SAAHST)−1 y (46)

= AHST(p SST)−1 y (AAH = pI) (47)

= AHST(p I)−1y (SST = I) (48)

=1

pAHSTy. (49)

Hence, the least square estimate x is given by

x = Ac (50)

=1

pAAHSTy using (49) (51)

= STy. (AAH = pI) (52)

This estimate sets all the missing values to zero!

No estimation of the missing values. Least square solution of no use here.

31 / 59


Short segments of speech can be sparsely represented using the DFT; thereforewe set A equal to the M × N DFT (19) with N = 1024.

BP solution obtained using 100 iterations of SALSA:

0 100 200 300 400 5000

0.02

0.04

0.06

0.08Estimated coefficients

Frequency (DFT index)

0 100 200 300 400 500

−0.5

0

0.5 Estimated signal

Time (samples)

The missing samples have been filled in quite accurately.

32 / 59

Total Variation Denoising (TVD)

The signal x is observed in additive white Gaussian noise (AWGN)

y(n) = x(n) + w(n), n ∈ {0, 1, 2, . . . , N − 1}

Total variation denoising is defined by

x = arg minx∈RN

{F (x) =

1

2

∑

n

|y(n)− x(n)|2 + λ∑

n

|x(n)− x(n − 1)|},

written also as

x = arg minx∈RN

{F (x) =

1

2‖y − x‖2

2 + λ ‖Dx‖1

}

where

D =

−1 1

−1 1

. . .. . .−1 1

.

• L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D,vol. 60, pp. 259–268, 1992.

33 / 59

Total Variation Denoising

0 50 100 150 200 250

−2

0

2

4

6Noise−free signal

0 50 100 150 200 250

−2

0

2

4

6Noisy signal

0 50 100 150 200 250

−2

0

2

4

6

L2−filtered signal (λ = 3.00)

0 50 100 150 200 250

−2

0

2

4

6

L2−filtered signal (λ = 8.00)

0 50 100 150 200 250

−2

0

2

4

6

TV−filtered signal (λ = 3.00)

TV denoising preserves discontinuities more accurately than linear filtering.

34 / 59

Total Variation Denoising - staircase artifacts

0 100 200 300 400 500 600 700 800 900 1000

−100

−50

0

50

100

ECG

Time (n)

0 100 200 300 400 500 600 700 800 900 1000

−100

−50

0

50

100

Time (n)

TV Denoising

TVD has staircase artifacts.

35 / 59

Total Variation Denoising - staircase artifacts

0 100 200 300 400 500 600 700 800 900 10000

50

100

150

Biosensor data

Time (n)

0 100 200 300 400 500 600 700 800 900 10000

50

100

150

Time (n)

TV Denoising


36 / 59

Sparse Singularity-Preserving Signal Smoothing (SIPS)We assume the signal of interest is of the form

s = f + x1 + x2, s, f , xi ∈ RN

whereI f is a low-pass signalI x1 is approximately piecewise constantI x2 is approximately piecewise linear

0 200 400 600 800 1000−2

−1

0

1

2

s = f + x1 + x

2

0 200 400 600 800 1000−2

−1

0

1

2f [Low−pass]

0 200 400 600 800 1000−2

−1

0

1

2

x1 [sparse derivative]

0 200 400 600 800 1000−2

−1

0

1

2

x2 [sparse 2nd derivative]

37 / 59

Sparse Singularity-Preserving Signal Smoothing (SIPS)

Based on the signal model

s = f + x1 + x2, s, f , xi ∈ RN

we minimize the objective function

J(x1, x2) =1

2‖Hy − H(x1 + x2)‖2

2 + λ1

∑

n

φ([Dx1]n) + λ2

∑

n

φ([D2x2]n)

where H is a high-pass filter.

If φ is the absolute value function, the regularizer is(λ1‖Dx1‖1 + λ2‖D2x2‖1

).

38 / 59

Penalty Function

The penalty function φ can be taken to be

φ(x) =

1a

log(1 + a|x |), a > 0

|x |, a = 0.

The parameter a > 0 controls the non-convexity of φ.

−4 −3 −2 −1 0 1 2 3 40

1

2

3

4

a = 0

a = 0.2

a = 0.5

a = 1.0

1D parametric penalty function

x

30 / 80


0 100 200 300 400 500 600 700 800 900 10000

50

100

150

Biosensor data

Time (n)

0 100 200 300 400 500 600 700 800 900 10000

50

100

150

Time (n)

TV Denoising


40 / 59


0 100 200 300 400 500 600 700 800 900 10000

50

100

150

Biosensor data

Time (n)

0 100 200 300 400 500 600 700 800 900 10000

50

100

150

Time (n)

SIPS Denoising

SIPS avoids staircase artifacts.

41 / 59


– Extension to higher-order singularities.

We assume the signal of interest is of the form

s = f + x2 + x3, s, f , xi ∈ RN

where

I f is a low-pass signal

I x2 is approximately piecewise linear

I x3 is approximately piecewise quadratic

We minimize the objective function

J(x2, x3) =1

2‖Hy − H(x2 + x3)‖2

2 + λ2

∑

n

φ([D2x2]n) + λ3

∑

n

φ([D3x3]n)

where H is a high-pass filter.

42 / 59


0 100 200 300 400 500 600 700 800 900 1000

−100

−50

0

50

100

ECG

Time (n)

0 100 200 300 400 500 600 700 800 900 1000

−100

−50

0

50

100

Time (n)

TV Denoising


43 / 59


0 100 200 300 400 500 600 700 800 900 1000

−100

−50

0

50

100

ECG

Time (n)

0 100 200 300 400 500 600 700 800 900 1000

−100

−50

0

50

100

Time (n)

SIPS Denoising

SIPS avoids staircase artifacts.

44 / 59

SIPS with L1 norm

0 200 400 600 800 1000

−100

−50

0

50

100

Noisy signal

0 200 400 600 800 1000

−100

−50

0

50

100

Low−pass filter (Ly)

0 200 400 600 800 1000

−100

−50

0

50

100

Singularity−preserving smoothing (SPS) [L1 norm penalty]

0 200 400 600 800 1000

−20

−10

0

10

20 u2

0 200 400 600 800 1000

−1

−0.5

0

0.5

1 u3

45 / 59

SIPS with non-convex penalty

0 200 400 600 800 1000

−100

−50

0

50

100

Noisy signal

0 200 400 600 800 1000

−100

−50

0

50

100

Low−pass filter (Ly)

0 200 400 600 800 1000

−100

−50

0

50

100

Singularity−preserving smoothing (SPS) [non−convex penalty]

0 200 400 600 800 1000

−20

−10

0

10

20 u2

0 200 400 600 800 1000

−1

−0.5

0

0.5

1 u3

46 / 59

Convex or non-convex: which is better for inverse problems?

Benefits of convex optimization

1. Absence of suboptimal local minima

2. Continuity of solution as a function of input data

3. Fewer complications when specifying regularization parameters

4. Availability of algorithms guaranteed to converge to a global optimum

But, non-convex regularization often performs better! Convex regularizationunder-estimates signal values (a ‘bias toward zero’).

Non-convex regularization induces sparsity more effectively and is a popularalternative to convex functions.

Can we exploit the strong sparsity-inducing properties of non-convex penaltieswithout forgoing the benefits of the convex approach?

47 / 59

Parameterized sparsity-inducing non-convex penaltyLet φ( · ; a) : R→ R be a penalty function with parameter a > 0 satisfying

1. φ is continuous on R2. φ is twice continuously differentiable, increasing, and concave on R+

3. φ(x ; 0) = |x |4. φ(0; a) = 0

5. φ(−x ; a) = φ(x ; a)

6. φ′(0+; a) = 1

7. φ′′(x ; a) > −a for all x 6= 0

−4 −3 −2 −1 0 1 2 3 40

1

2

3

4

a = 0

a = 0.2

a = 0.5

a = 1.0

1D parametric penalty function

x

48 / 59

Non-convex regularization, Convex optimizationTotal variation denoising with convex regularization:

x = arg minx∈RN

{F0(x) =

1

2‖y − x‖2

2 + λ ‖Dx‖1

}

With non-convex regularization:

x = arg minx∈RN

{Fa(x) =

1

2‖y − x‖2

2 + λ∑

n

φ([Dx ]n; a)}

Can we constrain φ so that Fa is convex?

Proposition

Fa is strictly convex if

infx 6=0

φ′′(x) > − 1

4λ.

When φ satisfies properties above, Fa is strictly convex if

0 6 a <1

4λ.

• I. W. Selesnick, A. Parekh, and I. Bayram, “Convex 1-D total variation denoising with non-convexregularization,” IEEE Signal Processing Letters, vol. 22, pp. 141–144, Feb. 2015.

49 / 59

Convex TVD with non-convex regularizationSet a to its maximal value to maximally induce sparsity of Dx while ensuringconvexity of the objective function.

a =1

4λ

0 50 100 150 200 250

−2

0

2

4

6Noisy data

σ = 0.50

0 50 100 150 200 250

−2

0

2

4

6TV denoising (convex penalty)

λ = 2.00 RMSE = 0.318

0 50 100 150 200 250

−2

0

2

4

6TV denoising (non−convex penalty, atan)

λ = 2.00 RMSE = 0.247

0 50 100 150 200 250

−1

−0.5

0

0.5

1

n

Denoising Error

convex

non−convex

50 / 59

Convex TVD with non-convex regularization

TVD with non-convex regularization:

x = arg minx∈RN

{F (x) =

1

2‖y − x‖2

2 + λ∑

n

φ([Dx ]n; a)}

TVD with non-separable non-convex regularization:

x = arg minx∈RN

{Fnonsep(x) =

1

2‖y − x‖2

2 + λ∑

n

ψ(([Dx ]n−1, [Dx ]n); a)}

where ψ : R2 → R.

51 / 59


Contours

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2 −2

0

20

1

2

3

4

L1 norm

Contours

−2 −1 0 1 2−2

−1

0

1

2

−2 −1 0 1 2 −20

20

1

2

3

4

Separable non−convex penalty

Contours

−2 −1 0 1 2−2

−1

0

1

2

−2 −1 0 1 2 −20

20

1

2

3

4

Non−separable non−convex penalty

a1 = a2 = 0 a1 = a2 > 0 a1 > a2 > 0 (Proposed)

(a) Separable, convex (b) Separable, non-convex (c) Non-separable, non-convex

52 / 59


Define ψ( · ; a) : R2 → R as

ψ(x) =

(1− r) [φ(x1;α) + φ(x2;α)] + r φ(x1 + x2;α), x ∈ A1

(1 + r)φ(x1; a2) + φ(rx1 + x2;α), x ∈ A2

(1 + r)φ(x2; a2) + φ(x1 + rx2;α), x ∈ A3

(53)

where Ai are subsets of R2 defined as

A1 = {x ∈ R2 | x1x2 > 0}, (54)

A2 = {x ∈ R2 | x1 (x1 + x2) 6 0}, (55)

A3 = {x ∈ R2 | x2 (x1 + x2) 6 0} (56)

and α and r are given by

α =a1 + a2

2, r =

{a1−a2a1+a2

, a1 + a2 > 0

0, a1 = a2 = 0.(57)

Proposition

Fnonsep is strictly convex if

a1 6 1

2λ, a2 6 1

4λ.

53 / 59

0 50 100 150 200 250

−2

0

2

4

6

8

Noisy data [σ = 0.50]

0 50 100 150 200 250

−2

0

2

4

6

8TV denoising using non−separable non−convex penalty

0 50 100 150 200 250

−1

−0.5

0

0.5

1

Denoising error

L1 (RMSE = 0.318)

Separable non−convex (RMSE = 0.247)

Non−separable non−convex (RMSE = 0.221)

54 / 59

Structured Sparsity with Overlapping Groups

A simple nonlinear thresholding approach to denoising is basis pursuit denoising

x = arg minx

{1

2‖y − x‖2

2 + λ‖x‖1

}= soft(x, λ)

Thresholding does not capture group structure (clustering/grouping behavior).

To account for clustering/grouping, we may use

x = arg minx

{1

2‖y − x‖2

2 + λ∑

n

√|x(n)|2 + |x(n + 1)|2 + |x(n + 2)|2

}

for group size 3.

The optimization is more challenging due to coupling among all signal valuesx(n), but yields superior results for speech enhancement.

• P.-Y. Chen and I. W. Selesnick, “Translation-invariant shrinkage/thresholding of group sparse signals,” SignalProcessing, vol. 94, pp. 476–489, Jan. 2014.

55 / 59

Structured Sparsity with Overlapping Groups: Speech Enhancement

Speech signals exhibit structured sparsity in time-frequency domain.

TIME (SECONDS)

FR

EQ

UE

NC

Y

0 0.5 1 1.5 2 2.5 30

1000

2000

3000

4000

5000

6000

7000

−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

TIME (SECONDS)

FR

EQ

UE

NC

Y

0 0.5 1 1.5 2 2.5 30

1000

2000

3000

4000

5000

6000

7000

−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

Noise-free signal Noisy signal

56 / 59


Scalar thresholding produces spurious noise spikes and musical noise.

TIME (SECONDS)

FR

EQ

UE

NC

Y

0 0.5 1 1.5 2 2.5 30

1000

2000

3000

4000

5000

6000

7000

−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

TIME (SECONDS)

FR

EQ

UE

NC

Y

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.64000

4500

5000

5500

6000

6500

7000

−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

Scalar thresholding Magnified view

57 / 59


New group shrinkage/thresholding (OGS) algorithm reduces musical noise.

TIME (SECONDS)

FR

EQ

UE

NC

Y

0 0.5 1 1.5 2 2.5 30

1000

2000

3000

4000

5000

6000

7000

−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

TIME (SECONDS)

FR

EQ

UE

NC

Y

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.64000

4500

5000

5500

6000

6500

7000

−50

−45

−40

−35

−30

−25

−20

−15

−10

−5

0

OGS algorithm Magnified view

58 / 59

Summary

1. Basis pursuit

2. Basis pursuit denoising

3. Sparse Fourier coefficients using BP

4. Denoising using BPD

5. Deconvolution using BPD

6. Filling in missing samples using BP

7. Total variation denoising (TVD)

8. Sparse singularity-preserving signal smoothing (SIPS)

9. Non-convex regularization, convex optimization

10. Group sparsity

59 / 59

Date post:	09-Jan-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Ivan Selesnick March 23, 2017 - New York University

Documents