Page 1 of 30
Single-channel source separation using non-negative matrix factorization
Mikkel N. SchmidtTechnical University of [email protected]
DTU InformaticsDepartment of Informatics and Mathematical Modeling
Page 2 of 30
Agenda
Single-channel source separation
Non-negative matrix factorization
NMF 2-D deconvolution (with Morten Mørup)
Speech separation using sparse NMF (with Rasmus K. Olsson)
NMF with Gaussian process priors (with Hans Laurberg)
Page 3 of 30
Single-channel source separation
Additive model
Mixture Sources Noise
Under-determined problem: More information required
Example: Two-source noise-free
Page 4 of 30
Approaches to single-channel source separation
Filtering
Decomposition and grouping
Source modelling
Page 5 of 30
Model-based source-separation
model
reconstruction
featureextraction
featureextraction
separation
trainingdata
mixture
estimate
data
infe
renc
e
repr
esen
tatio
n
mod
el
Page 6 of 30
Signal representation
Emphasize desired characteristics
Introduce invariances
Allow assumptions of independence or exchangeability
Reduce dimensionality
Allow signal reconstruction
Page 7 of 30
Model
Mixing model
Source model
Model building
Model training
Model adaptation
Goals
Accurately model sources and mixing process
Enable efficient inference
Likelihood
Prior
Page 8 of 30
Inference
Estimate sources:
Maximum a posteriori, posterior mean, marginal MAP, etc.
Solve optimization or integration problem
Posterior Likelihood Prior
Page 9 of 30
Non-negative matrix factorization
Data Factorizing matrices
Non-negativity constraints
Page 10 of 30
Non-negative matrix factorization
Non-negative bilinear model
Sum of products of non-negative variables
Page 11 of 30
Why non-negativity?
Many signals are naturally non-negative
Pixel intensities
Amplitude spectra
Occurrence counts
Discrete probabilities
Additive combination of features
No cancellations
Build the whole as a sum of parts
Page 12 of 30
Basic computation
Divergence measure
Constraints
Constrained minimization problem
Page 13 of 30
Probabilistic formulation
Likelihood Prior
Maximum likelihood
Maximum a posteriori
Page 14 of 30
Divergence measures (Likelihoods)
Squared error (Lee and Seung, 1999)
Kullback-Leibler divergence (Lee and Seung, 1999)
Bregman's divergence (Dhillon and Sra, 2005)
Kompass' divergence (Kompass, 2007)
Csiszár's divergence (Cichocki et al., 2006)
Amari's alpha divergence (Cichocki et al., 2006)
Weighted versions of the above (Guillamet et al., 2001)
Page 15 of 30
Distribution of the factors (Priors)
Sparsity (Hoyer, 2002)
Orthogonality (Ding et al., 2005)
Discriminative (Wang et al., 2004)
Smoothness (Virtanen, 2003)
Gaussian process (Schmidt and Laurberg, 2008)
Transformation invariance (Wersing et al., 2003)
Convolutive (Virtanen, 2004; Smaragdis, 2004)
2-D convolutive (Schmidt and Mørup, 2006)
Page 16 of 30
Optimization strategies
Direct optimization
Alternating optimization
Alternating descent
Can haveconvex
subproblems
Non-convex
Page 17 of 30
Algorithms
Projected least squares (Paatero, 1997)
Multiplicative updates (Lee and Seung, 1999)
Projected gradient descent (Lin, 2007)
Logarithmic barrier method (Lu and Wu, 2005)
Active set (Berry et al., 2006)
Quasi Newton (Kim et al., 2007)
Reparametrization (Cichocki et al., 2006)
SOCP (Heiler and Schnörr, 2006)
Page 18 of 30
NMF 2-D deconvolution
NMF
NMF 2-D
Page 19 of 30
Example: Flute and guitar
Time [seconds]
Freq
uenc
y [k
Hz]
2 4 6 8 10
.2
.4
.8
1.6
3.2
Page 20 of 30
Speech separation using sparse NMF
Learn basis for each source
Basis
Training data
Mixture
Decompose mixture
Reconstruct each source
Source estimate
Page 21 of 30
Example: Two speakers
Time [seconds]1 2 3 4
Freq
uenc
y [k
Hz]
3.5
3
2.5
2
1.5
1
.5
MixtureBases
Page 22 of 30
NMF with Gaussian process priors
Data Gaussian processes
Link functions
GP: General distribution over functions
Link function: Non-linear map to non-negative reals
Page 23 of 30
Gaussian process
Distribution over functions
Characterized by
Mean function
Covariance function
Example
Page 24 of 30
Change of variable
Cholesky decomposition
I.I.D. standard normal
Same model, different parametrization
Parameters are a priori uncorrelated
Empirically better optimization properties
Page 25 of 30
Illustration of NMF with GP priors
Page 26 of 30
Example: Toy problem
Noisy data Underlying data
NMF GPP-NMF
Page 27 of 30
Example: Toy problem
Columns of A Rows of B
Underlying data
NMF
GPP-NMF
Page 28 of 30
Example: Chemical shift brain imaging
Data: 369 point spectra @ 512 points (888 grid) in human head
Task: Distinguish between brain and muscle tissue
Spectra prior: Smooth exponential distribution
Activation prior: 3-D smooth, exponential distribution, left-to-right symmetric
Random draw from prior
Page 29 of 30
Example: Chemical shift brain imaging
Non-negative matrix factorization
NMF with Gaussianprocess priors
Mus
cle
Bra
in
Page 30 of 30
Conclusions
Single-channel source separation is a difficult problem that occurs in many areas
Model based source separation provides a principled approach to solving the problem
NMF and its extensions are useful models for single-channel source separation
Gaussian processes can be used as a general framework for incorporating prior information in NMF