Non-negative Matrix Completion for Bandwidth Extension: A...

Non-negative Matrix Completion for Bandwidth Extension: A

Convex Optimization Approach

Dennis L. SunDepartment of Statistics

Stanford University

Rahul MazumderDepartment of StatisticsColumbia University

Abstract

Bandwidth extension is the problem of recovering missing bandwidth in audio signals that have beenband-passed, typically for compression purposes. One approach that has been shown to be successful forbandwidth extension is non-negative matrix factorization (NMF). The disadvantage of NMF is that it isnon-convex and intractable to solve in general. However, in bandwidth extension, only the reconstructionis needed and not the explicit factors. We formulate bandwidth extension as a convex optimizationproblem, propose a simple algorithm, and demonstrate the effectiveness of this approach on practicalexamples.

1 Introduction

Audio signals are often low-passed before transmission over low bit-rate channels, such as phone lines,radio, and Internet streams. Although this filtering prevents high frequency artifacts and aliasing fromundersampling, it also results in the distinctive muffled quality of speech that has been transmitted overphone lines and streaming audio. Since the bandwidth of a signal is a major determinant of its perceivedquality [1], it is desirable to extend the bandwidth of a band-passed signal to its full range. The problem ofrecovering the missing bandwidth is known as bandwidth extension.

Traditional approaches attempted to explicitly model the relationship between the observed and unob-served spectral content [2, 3]. For example, for harmonic signals such as speech and music, knowledge of afew harmonics enables one to impute the remaining harmonics. More recently, data-driven approaches tobandwidth expansion have been proposed [4, 5]. These methods leverage full bandwidth audio that is similarto the audio whose bandwidth is to be recovered. Since they make fewer a priori assumptions, they havethe potential to be applicable to a wider range of sounds.

2 Low-Rank Model for Spectrograms

Data-driven approaches are based on modeling the audio spectrogram as a low-rank matrix [6]. Low-rankmatrix methods have recently experienced a resurgence of interest in the machine learning community, thanksto the Netflix prize and other collaborative filtering applications. The low-rank assumption is typicallymotivated in these applications by a factor model; for example, in collaborative filtering, it is assumed thata few latent variables explain most user preferences [7].

In an audio context, the low-rank assumption is typically justified by the fact that audio signals aretypically superpositions of relatively few component signals. For example, the waveform of an orchestraperforming a symphony is simply the sum of the waveforms of the individual instruments. This additivitycarries over to the complex frequency domain, although it is typical to consider only the magnitudes and notthe phases, where the additivity is only approximate. We assume that the magnitude spectrum yi at timei = 1, ..., n can be expressed as a linear combination of a few latent spectral features w`, ` = 1, ..., k:

yi ≈∑`

h`iw`

1

Under this assumption, the spectrogram Y =[y1, ..., yn

], which is simply a matrix comprised of the spectra

at each time frame, can be written as Y = WH, where Y is m×n, W is m×k, and H is k×n. Since typicallyk � m,n, the rank of Y is k. Figure 1 shows a spectrogram and its singular values, on a logarithmic scale,demonstrating that the low-rank assumption holds in practice.

100 200 300 400 500 600 70010

−2

10−1

100

101

102

103

Index K

Kth

Sin

gu

lar

Va

lue

Figure 1: A typical spectrogram and its singular values on a logarithmic scale.

In audio applications, this model is typically applied by solving the non-negative matrix factorizationproblem

minimizeW,H≥0

D(Y ||WH) (1)

for some measure of divergence D. The matrix W then contains the spectral features and H their activationsover time. Since the entries of the matrices are interpreted as magnitudes, the factors are constrained tobe nonnegative (i.e., W,H ≥ 0). This approach has proven useful for a variety of audio applications, fromdenoising to source separation to sound recognition. The drawback is that solving non-negative matrixfactorization problems is NP-hard [8], and existing algorithms are only guaranteed to converge to a localoptimum. The quality of the solution is highly sensitive to initialization and even choice of algorithm [9].

However, many audio applications do not require the explicit factors, only the approximating low-rankmatrix. Bandwidth extension is one such example. In contrast to the problem of finding nonnegative factors,the problem of finding a nonnegative low-rank approximating matrix can be formulated as a convex program.In the following section, we develop a convex formulation of the problem.

3 Convex Optimization Framework

3.1 Model

The problem of finding a non-negative, low-rank approximating matrix can be formulated as

minimize∑

(i,j)∈O

d(Yij , Xij)

subject to Xij ≥ 0, rank(X) ≤ k(2)

where O is the set of observed entries and d is a smooth divergence function. We make the simplifyingassumption that d is a β-divergence, i.e., of the form:

dβ(x, y) =

1

β(β−1) (xβ + (β − 1)yβ + βxyβ−1) β 6= 0, 1

x log xy − x+ y β = 1

xy − log x

y − 1 β = 0

2

β dβ(x, y) Name2 1

2 (x− y)2 Euclidean1 x log x

y − x+ y Kullback-Leibler (KL)

0 xy − log x

y − 1 Itakura-Saito (IS)

Table 1: Commonly used members of the β-divergence family.

which is a family of divergences that has previously been considered for audio and NMF [10]. The threecommonly used members of this family are listed in Table 1. Our reason for considering this entire family ofdivergences is that Euclidean distance has been shown experimentally to perform worse on audio tasks thanKL and IS divergence [11].

For β ∈ [1, 2], dβ is convex in its second argument, so the non-convexity in (2) arises only from the rankconstraint. It has been shown that the nuclear norm is an effective convex surrogate for the rank [12, 13].Following these lines, we relax the rank constraint to a nuclear norm penalty:

minimize∑

(i,j)∈O

dβ(Yij , Xij) + λ||X||∗

subject to Xij ≥ 0

(3)

Note that for β = 0, 1 the domain of dβ is R+, so in these cases the non-negativity constraint in (3) isredundant.

3.2 Algorithm

As a starting point to deriving an algorithm for (3), we note that the related problem

minimize1

2||X − Y ||2F + λ||X||∗ (4)

has the simple closed-form solution X∗ = Sλ(Y ) := Udiag(sλ(di))VT , where Y = UDV T is the singular

value decomposition of Y and sλ(x) = sgn(x)(|x| − λ)+ denotes the soft-thresholding operator. See [14] fora proof. There are three reasons that (3) is more complicated than (4):

1. the divergence dβ , for β 6= 2

2. the restriction of the loss to the set O, i.e., only the observed entries of Y

3. the nonnegativity constraint Xij ≥ 0

This suggests a splitting method to decouple the divergence and nonnegativity constraint from the nuclearnorm penalty. The first step is to rewrite (3) equivalently as:

minimize∑

(i,j)∈O

dβ(Yij , Xij) + λ||Z||∗

subject to Xij ≥ 0, X = Z

(5)

The Alternating Direction Method of Multipliers (ADMM) [15] is one method of solving such problems. Itoperates on the augmented Lagrangian:

Lρ(X,Z, ν) =∑

(i,j)∈O

dβ(Yij , Xij) + λ||Z||∗

+ 〈ν,X − Z〉+ρ

2||X − Z||2F (6)

3

where the matrix inner-product is defined as 〈A,B〉 :=∑i,j AijBij . ADMM alternately optimizes Lρ over

X, then Z, followed by gradient ascent in the dual variable ν. Thus, the updates are:

X ← argminXij≥0

∑(i,j)∈O

dβ(Yij , Xij) + 〈ν,X〉+ρ

2||X − Z||2F (7)

Z ← argminZ

λ||Z||∗ − 〈ν, Z〉+ρ

2||X − Z||2F (8)

ν ← ν + ρ(X − Z) (9)

We now analyze each of these updates in turn.

1. The update (7) for X splits coordinatewise, so it reduces to solving smooth one-dimensional problems.The updates for the unobserved entries exist in closed form, and the updates for the observed entriesreduce to a one-dimensional polynomial root-finding problem.

2. The update (8) for Z can also be obtained in closed form. To see this, note that after adding andscaling by appropriate constants, the objective in (8) can be rewritten as:

Z ← argminZ

1

2||Z − (X +

1

ρν)||2F +

λ

ρ||Z||∗

which is just a special case of (4).

Algorithm 1 summarizes these simple updates.

Algorithm 1 Non-negative Matrix Completion

inputs Y , X, Z, νrepeat

for (i, j) 6∈ O do

Xij = max(Zij − 1

ρνij , 0)

end forfor (i, j) ∈ O do

Xij = argminXij≥0

d(Yij , Xij) + νijXij + ρ2 (Xij − Zij)2

end forZ = Sλ/ρ

(X + 1

ρν)

ν = ν + ρ(X − Z)until |∆Xij | < ε for all (i, j) return X

The update for Xij , (i, j) ∈ O has a closed form solution in the important cases β = 1, 2:

(β = 1,KL divergence)

Xij =(ρZij − νij − 1) +

√(ρZij − νij − 1)2 + 4ρYij2ρ

(β = 2,Euclidean divergence)

Xij =max(ρZij − νij + Yij , 0)

1 + ρ

One advantage of this framework, which NMF does not afford, is that it is trivial to extend to algorithmto handle arbitrary box constraints, i.e. a ≤ Xij ≤ b instead of Xij ≥ 0. The box constraint simply changesthe set over which the minimization in (7) takes place. This amounts to replacing max(·, 0) in the formulasby a “clipping” function κ:

κ(·) = max(a,min(b, ·)) (10)

For example, one might use an upper bound to constrain the power of the signal, or a lower bound of ε tobound entries away from zero for stability reasons.

4

4 Application to Bandwidth Extension

To apply Algorithm 1 to bandwidth extension, we concatenate the full bandwidth training data with theband-limited test data. If the training and test data in fact share a common set of spectral features, thenthe concatenated matrix should be low-rank. A sample input into Algorithm 1 is shown in Figure 2. Notethat this approach of jointly learning from the training and test data contrasts with the two-stage mannerby which NMF is typically applied, where one first learns the spectral features W from the training dataand then fixes these features in fitting to the test data.

Y =

Figure 2: We concatenate the training and test data to form a single matrix with missing entries in the topright, corresponding to high frequencies in the test data.

Algorithm 1 involves a choice of two tuning parameters, ρ and λ. ρ is associated with the augmentedLagrangian term in the ADMM algorithm. The choice of ρ merely controls properties of the algorithm,not of the solution; we set ρ = 1 in all of our experiments. On the other hand, the choice of λ is morefundamental, as it controls the degree of rank regularization. Fortunately, for bandwidth extension, there isa natural way to perform cross validation to set λ automatically, described in Algorithm 2.

Algorithm 2 Cross-Validation for Bandwidth Extension

Divide the training data into k time segments.for each candidate λ do

for i = 1, ..., k do– Apply a band-pass filter to time segment i,

similar to that encountered in the test data.– Apply Algorithm 1 to recover the bandwidth.– Calculate the reconstruction error.

end forAverage the reconstruction error over the k folds.

end forChoose the λ with the minimum average reconstruction error.

5 Evaluation

5.1 Music Example

Following the lead of [16], we attempted to recover the bandwidth of an aggressively low-passed recording ofthe song “Back in Black” by the band AC/DC, by training on a recording of “Highway to Hell” by the sameband. We used the guitar introductions of the two songs, which feature a similar power chord and percussiontexture, and applied a low-pass filter to the latter which eliminated 70% of high frequency content. The

5

Figure 3: Shown above are the spectrograms of “Highway to Hell” (top), which we was used as training datato recover the missing bandwidth in “Back in Black” (bottom), which was low-passed to have 70% of highfrequencies missing.

spectrograms of the training and test data are shown in Figure 3. The sound files were sampled at 22050Hz, and we used a 1024-point Hann window with a hop size of 256 samples.

We then applied 5-fold cross-validation (see Algorithm 2) to the full-bandwidth training example (“High-way to Hell”) and found the cross-validation error to be minimized for λ = 37.4. We then solved (3) forKL divergence (β = 1) with this parameter to recover the missing bandwidth in “Back in Black,” applyingAlgorithm 1 to the concatenated spectrograms.

10−2

100

102

0

0.5

1

1.5

2

2.5

3

3.5x 10

4

λ

CV

Err

or

Figure 4: Shown in light gray are the error curves for each of the folds, when 5-fold cross validation wasapplied on “Highway to Hell”. The solid black line represents the average cross validation error, which isminimized for a lambda of 37.4.

For comparison, we also applied the method in [16] based on PLCA, a probabilistic interpretation of non-negative matrix factorization. We used code provided to us by the authors. To make the comparison fair, wechose the number of components k by cross validation and found k = 5 to be optimal. The reconstructionerror (on the missing entries) is shown in Table 2, using the KL loss as the error metric. The metrics confirmthe subjective perceptual evaluation that the proposed method produces a superior reconstruction.

We also used Euclidean divergence (β = 2), i.e. squared error loss. We compared it against NMF usingthe same divergence function [10], as well as a similar method discussed in [16] which simply imputes valuesusing the SVD (and thresholding negative entries to zero). Cross validation was again used to select λ and

6

Proposed NMF/PLCA SVD Ground Truth

Figure 5: Reconstructed spectrograms for “Back in Black” from the different methods, as compared to theground truth.

Error Proposed NMF/PLCA SVDβ = 1 KL 52704 55121 —β = 2 Euclidean 209 207 205

Table 2: The reconstruction error (distance between the estimate and the ground truth) using the twodifferent loss functions. The error metric corresponds to the loss function used.

k. In terms of Euclidean distance, SVD performs the best, followed by NMF and the proposed method.Perceptually, the results are inferior to β = 1, possibly because the Euclidean metric is not appropriate foraudio.

The reconstructed spectrograms are shown in Figure 5 for β = 2 to facilitate comparison of the threeapproaches. The proposed approach oversmooths less than the other two approaches, thus producing aslightly more pleasing reconstruction. As expected, none of the methods are able to recover the hi-hatvisible in the upper frequency ranges in the ground truth, simply because this information was not availablein the low-passed recording.

5.2 Speech Example

Another important application of bandwidth extension is speech enhancement, since speech is perhaps themost frequency encountered compressed audio signal. We took 10 sentences spoken by a female speakerfrom the TIMIT database, applied a low-pass filter to 2 of the sentences to eliminate 70% of the upperfrequencies. We then applied the proposed method to recover the missing bandwidth from the “compressed”speech. This corresponds to a real-world scenario in which one would like to use high fidelity recordings ofa person’s voice to enhance low fidelity speech of the same speaker.

The spectrograms of the training and test signals were already shown in Figure 2. Figure 6 shows theoutput and the ground truth. The low-rank matrix imputation has successfully recovered the harmonics ofthe speech, although it fares less well at capturing the transients.

6 Discussion

6.1 Scalability

Although we have observed that the proposed convex optimization approach tends to yield better solutionsthan non-convex approaches, a naive Matlab implementation of the proposed method can be slower thanthe competitors. The main bottleneck of Algorithm 1 is update (8) for Z, which requires an SVD forevery iteration. The other updates are inexpensive, and more importantly, are readily parallelized. On theother hand, SVD algorithms for generic matrices have a complexity of O(min(mn2,m2n)). Due to spaceconstraints, scalability concerns must be deferred to future work, but we note that in general we seek alow-rank matrix and so only need a few leading singular vectors/values. Algorithms such as PROPACKcompute a rank k SVD with cost O(nmk), which can be advantageous when k � m,n.

7

Figure 6: Shown above are the spectrograms of the reconstruction (left) and the ground truth (right). SeeFigure 2 for the spectrograms of the original training and test signals.

6.2 Connection to NMF

We derived the proposed method by regarding NMF as a low-rank approximation, but we have yet to relatethe solution obtained by this method to the factors one would obtain using NMF. The nuclear norm of amatrix can be viewed as a an `2-regularization on its latent factors, as long as the rank is sufficiently large.This is formalized in the following result, a proof of which appears in [7, 14]:

Lemma 1. The nuclear norm can be represented as

‖X‖∗ = infWm×k,Hk×n:WH=X

1

2

(‖W‖2F + ‖H‖2F

)(11)

for any k ≥ rank(X), where ‖ · ‖F denotes the Frobenius norm (the `2 norm of the entries).

The following result is a consequence:

Theorem 1. The nuclear norm regularized objective function (3) is equivalent to the following non-convexproblem in Wm×k, Hk×n:

minimize(WH)ij≥0

∑(i,j)∈O

dβ(Yij , (WH)ij) +λ

2

(‖W‖2F + ‖H‖2F

)(12)

for any k ≥ rank(X∗) where X∗ is a minimum of the convex optimization problem (3).

It is insightful to compare the above problem (12) to NMF based models for the same task:

minimizeWik≥0,Hkj≥0

∑(i,j)∈O

dβ(Yij , (WH)ij) (13)

Although neither (12) nor (13) are convex optimization problems in W,H, Theorem 1 says that for k largeenough and β ∈ [1, 2] problem (12) can be equivalently written as a convex optimization problem inX = WH.This characterization is crucial since it guarantees that the global solution to the non-convex problem (12)can be solved in polynomial time (for sufficiently large k), unlike the NMF problem (13).

Whereas (13) constrains the factors to be non-negative, (12) only constrains their product to be non-negative. In many audio applications, non-negativity of the factors is desirable for interpretability, but inthe context of bandwidth extension, only the reconstruction is needed, not the individual factors.

Alternatively, (12) can be thought of as a relaxation of (13), since it replaces the intractable constraintsWik, Hkj ≥ 0 by (WH)ij ≥ 0. In practice, one can verify that a solution to (12) also solves (13) (withadditional `2 penalties) by checking that the NMF of the solution perfectly reconstructs the matrix, i.e. thatthe solution admits non-negative factors.

8

7 Conclusion

In many audio problems, the goal is simply to estimate a reconstruction rather than to find explicit factors.In this case, it is possible to formulate the problem as a convex program, so that it is feasible to solve theproblem exactly, unlike with NMF. We propose an algorithm for this problem based on the AlternatingDirection Method of Multipliers [15], leading to an algorithm with simple updates (see Algorithm 1). Wealso propose a method of performing cross-validation for bandwidth extension, thus making the procedureentirely automatic and parameter-free from the user’s perspective. We demonstrate that our algorithmcompares favorably with competitor methods in a number of practical experiments. Although the focus ofthis paper was on non-negativity constraints and bandwidth extension, our framework extends readily toother box constraints, thus potentially opening the door to applications beyond audio.

8 Acknowledgements

We are grateful to Paris Smaragdis for sharing code and helpful discussions.

References

[1] S. Voran, “Listener ratings of speech passbands,” in Speech Coding For Telecommunications Proceeding,IEEE Workshop on. IEEE, 1997, pp. 81–82.

[2] Y.M. Cheng, D. O’Shaughnessy, and P. Mermelstein, “Statistical recovery of wideband speech fromnarrowband speech,” Speech and Audio Processing, IEEE Transactions on, 1994.

[3] H. Yasukawa, “Restoration of wide band signal from telephone speech using linear prediction errorprocessing,” in IEEE International Conference on Spoken Language Processing (ICSLP), 1996.

[4] D. Bansal, B. Raj, and P. Smaragdis, “Bandwidth expansion of narrowband speech using non negativematrix factorization,” in Proc. Interspeech, 2005.

[5] P. Smaragdis, B. Raj, and M. Shashanka, “Supervised and semi-supervised separation of sounds fromsingle-channel mixtures,” Independent Component Analysis and Signal Separation, 2007.

[6] P. Smaragdis and J.C. Brown, “Non-negative matrix factorization for polyphonic music transcription,”in Applications of Signal Processing to Audio and Acoustics, 2003 IEEE Workshop on. IEEE, 2003.

[7] N. Srebro, J.D.M. Rennie, and T. Jaakkola, “Maximum-margin matrix factorization,” Advances inneural information processing systems, 2005.

[8] S.A. Vavasis, “On the complexity of nonnegative matrix factorization,” SIAM Journal on Optimization,2009.

[9] A. Lefevre, Dictionary learning methods for single-channel source separation, Ph.D. thesis, 2012.

[10] C. Fevotte and J. Idier, “Algorithms for nonnegative matrix factorization with the β-divergence,”Neural Computation, vol. 23, no. 9, 2011.

[11] B. King, C. Fevotte, and P. Smaragdis, “Optimal cost function and magnitude power for nmf-basedspeech separation and music interpolation,” in Machine Learning for Signal Processing (MLSP), 2012IEEE International Workshop on. IEEE, 2012, pp. 1–6.

[12] M. Fazel, Matrix rank minimization with applications, Ph.D. thesis, PhD thesis, Stanford University,2002.

[13] E.J. Candes and T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” Infor-mation Theory, IEEE Transactions on, vol. 56, no. 5, pp. 2053–2080, 2010.

9

[14] R. Mazumder, T. Hastie, and R. Tibshirani, “Spectral regularization algorithms for learning largeincomplete matrices,” Journal of Machine Learning Research, 2010.

[15] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statisticallearning via the alternating direction method of multipliers,” Foundations and Trends in MachineLearning, vol. 3, no. 1, pp. 1–122, 2011.

[16] P. Smaragdis, B. Raj, and M. Shashanka, “Missing data imputation for spectral audio signals,” inMachine Learning for Signal Processing, IEEE International Workshop on, 2009.

10

Date post:	11-May-2020
Category:	Documents
Upload:	others
View:	13 times
Download:	0 times

Non-negative Matrix Completion for Bandwidth Extension: A...

Documents