Download - Single-channel source separation using non-negative matrix ...mikkelschmidt.dk/presentations/thesis_presentation.pdf · Non-negative matrix factorization Non-negative bilinear model

Page 1 of 30

Single-channel source separation using non-negative matrix factorization

Mikkel N. SchmidtTechnical University of [email protected]

DTU InformaticsDepartment of Informatics and Mathematical Modeling

Page 2 of 30

Agenda

Single-channel source separation

Non-negative matrix factorization

NMF 2-D deconvolution (with Morten Mørup)

Speech separation using sparse NMF (with Rasmus K. Olsson)

NMF with Gaussian process priors (with Hans Laurberg)

Page 3 of 30

Single-channel source separation

Additive model

Mixture Sources Noise

Under-determined problem: More information required

Example: Two-source noise-free

Page 4 of 30

Approaches to single-channel source separation

Filtering

Decomposition and grouping

Source modelling

Page 5 of 30

Model-based source-separation

model

reconstruction

featureextraction

featureextraction

separation

trainingdata

mixture

estimate

data

infe

renc

e

repr

esen

tatio

n

mod

el

Page 6 of 30

Signal representation

Emphasize desired characteristics

Introduce invariances

Allow assumptions of independence or exchangeability

Reduce dimensionality

Allow signal reconstruction

Page 7 of 30

Model

Mixing model

Source model

Model building

Model training

Model adaptation

Goals

Accurately model sources and mixing process

Enable efficient inference

Likelihood

Prior

Page 8 of 30

Inference

Estimate sources:

Maximum a posteriori, posterior mean, marginal MAP, etc.

Solve optimization or integration problem

Posterior Likelihood Prior

Page 9 of 30


Data Factorizing matrices

Non-negativity constraints

Page 10 of 30


Non-negative bilinear model

Sum of products of non-negative variables

Page 11 of 30

Why non-negativity?

Many signals are naturally non-negative

Pixel intensities

Amplitude spectra

Occurrence counts

Discrete probabilities

Additive combination of features

No cancellations

Build the whole as a sum of parts

Page 12 of 30

Basic computation

Divergence measure

Constraints

Constrained minimization problem

Page 13 of 30

Probabilistic formulation

Likelihood Prior

Maximum likelihood

Maximum a posteriori

Page 14 of 30

Divergence measures (Likelihoods)

Squared error (Lee and Seung, 1999)

Kullback-Leibler divergence (Lee and Seung, 1999)

Bregman's divergence (Dhillon and Sra, 2005)

Kompass' divergence (Kompass, 2007)

Csiszár's divergence (Cichocki et al., 2006)

Amari's alpha divergence (Cichocki et al., 2006)

Weighted versions of the above (Guillamet et al., 2001)

Page 15 of 30

Distribution of the factors (Priors)

Sparsity (Hoyer, 2002)

Orthogonality (Ding et al., 2005)

Discriminative (Wang et al., 2004)

Smoothness (Virtanen, 2003)

Gaussian process (Schmidt and Laurberg, 2008)

Transformation invariance (Wersing et al., 2003)

Convolutive (Virtanen, 2004; Smaragdis, 2004)

2-D convolutive (Schmidt and Mørup, 2006)

Page 16 of 30

Optimization strategies

Direct optimization

Alternating optimization

Alternating descent

Can haveconvex

subproblems

Non-convex

Page 17 of 30

Algorithms

Projected least squares (Paatero, 1997)

Multiplicative updates (Lee and Seung, 1999)

Projected gradient descent (Lin, 2007)

Logarithmic barrier method (Lu and Wu, 2005)

Active set (Berry et al., 2006)

Quasi Newton (Kim et al., 2007)

Reparametrization (Cichocki et al., 2006)

SOCP (Heiler and Schnörr, 2006)

Page 18 of 30

NMF 2-D deconvolution

NMF

NMF 2-D

Page 19 of 30

Example: Flute and guitar

Time [seconds]

Freq

uenc

y [k

Hz]

2 4 6 8 10

.2

.4

.8

1.6

3.2

Page 20 of 30

Speech separation using sparse NMF

Learn basis for each source

Basis

Training data

Mixture

Decompose mixture

Reconstruct each source

Source estimate

Page 21 of 30

Example: Two speakers

Time [seconds]1 2 3 4

Freq

uenc

y [k

Hz]

3.5

3

2.5

2

1.5

1

.5

MixtureBases

Page 22 of 30

NMF with Gaussian process priors

Data Gaussian processes

Link functions

GP: General distribution over functions

Link function: Non-linear map to non-negative reals

Page 23 of 30

Gaussian process

Distribution over functions

Characterized by

Mean function

Covariance function

Example

Page 24 of 30

Change of variable

Cholesky decomposition

I.I.D. standard normal

Same model, different parametrization

Parameters are a priori uncorrelated

Empirically better optimization properties

Page 25 of 30

Illustration of NMF with GP priors

Page 26 of 30

Example: Toy problem

Noisy data Underlying data

NMF GPP-NMF

Page 27 of 30

Example: Toy problem

Columns of A Rows of B

Underlying data

NMF

GPP-NMF

Page 28 of 30

Example: Chemical shift brain imaging

Data: 369 point spectra @ 512 points (888 grid) in human head

Task: Distinguish between brain and muscle tissue

Spectra prior: Smooth exponential distribution

Activation prior: 3-D smooth, exponential distribution, left-to-right symmetric

Random draw from prior

Page 29 of 30

Example: Chemical shift brain imaging


NMF with Gaussianprocess priors

Mus

cle

Bra

in

Page 30 of 30

Conclusions

Single-channel source separation is a difficult problem that occurs in many areas

Model based source separation provides a principled approach to solving the problem

NMF and its extensions are useful models for single-channel source separation

Gaussian processes can be used as a general framework for incorporating prior information in NMF