+ All Categories
Home > Engineering > Divergence optimization in nonnegative matrix factorization with spectrogram restoration for...

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for...

Date post: 06-Aug-2015
Category:
Upload: daichi-kitamura
View: 93 times
Download: 1 times
Share this document with a friend
Popular Tags:
30
Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan) 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays Oral session 2 – Microphone array processing
Transcript
Page 1: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

Divergence optimization in nonnegative matrix factorization with spectrogram restoration for

multichannel signal separation

Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura(Nara Institute of Science and Technology, Japan)

Yu Takahashi, Kazunobu Kondo(Yamaha Corporation, Japan)

Hirokazu Kameoka(The University of Tokyo, Japan)

4th Joint Workshop on Hands-free Speech Communication and Microphone ArraysOral session 2 – Microphone array processing

Page 2: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

2

Outline• 1. Research background• 2. Conventional methods

– Directional clustering– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Hybrid method

• 3. Analysis of restoration ability– Generalized cost function– Analysis based on generation model

• 4. Experiments• 5. Conclusions

Page 3: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

3

Outline• 1. Research background• 2. Conventional methods

– Directional clustering– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Hybrid method

• 3. Analysis of restoration ability– Generalized cost function– Analysis based on generation model

• 4. Experiments• 5. Conclusions

Page 4: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

4

Research background• Signal separation have received much attention.

• Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area.

• Supervised NMF (SNMF) achieves the highest separation performance.

• To improve its performance, SNMF-based multichannel signal separation method is required.

• Automatic music transcription• 3D audio system, etc.

Applications

Separate!

We have proposed a new SNMF and its hybrid separation method for multichannel signals.

Page 5: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

5

Research background• Our proposed hybrid method

Input stereo signal

Spatial separation method (Directional clustering)

SNMF-based separation method(SNMF with spectrogram restoration)

Separated signal

L R

Page 6: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

6

Research background• Divergence criterion in SNMF strongly affects

separation performance.– Euclidian distance (EUC-distance)– Kullback-Leibler divergence (KL-divergence)– Itakura-Saito divergence (IS-divergence)

• The optimal divergence for SNMF with spectrogram restoration is not apparent.

We extend our new SNMF to a more generalized form.We give a theoretical analysis for the optimization of the divergence.

Page 7: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

7

Outline• 1. Research background• 2. Conventional methods

– Directional clustering– NMF– Supervised NMF– Hybrid method

• 3. Analysis of restoration ability– Generalized cost function– Analysis based on generation model

• 4. Experiments• 5. Conclusions

Stereo signal

Spatial separation

Spectral separation

Separated signal

Hybrid method

Page 8: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

8

Directional clustering [Araki, et al., 2007]

• Directional clustering– Unsupervised spatial separation method

• Problems– Cannot separate sources in the same direction– Artificial distortion arises owing to the binary masking.

Right

L R

CenterLeft

L R

Center

Binary masking

Input signal (stereo) Separated signal

1 

1 

1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

1  1

1 

0 

0 

1 

0 

0 

0 

0 

0 

1  1

1  1

1 

1 

Fre

quen

cy

Time

C 

C 

C 

R  L

R 

C 

L 

L 

L 

R 

R 

C 

C  C

C 

R 

R 

C 

R 

R 

L 

L 

L 

C CC  C

C 

C 

Fre

quen

cy

Time

Binary maskSpectrogram

Entry-wise product

Page 9: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

9

• NMF can extract significant spectral patterns.

– Basis matrix has frequently-appearing spectral patterns in .

NMF [Lee, et al., 2001]

Amplitude

Am

plitu

de

Observed matrix(spectrogram)

Basis matrix(spectral patterns)

Activation matrix(Time-varying gain)

Time

: Number of frequency bins: Number of time frames: Number of bases

Time

Fre

quen

cy

Fre

quen

cy

Basis

Page 10: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

10

Divergence criterion in NMF• Cost function in NMF

– Euclidian distance (EUC-distance)

– Kullback-Leibler divergence (KL-divergence)

– Itakura-Saito divergence (IS-divergence)

: Entries of variable matrices and , respectively.

Page 11: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

11

• SNMF – Supervised spectral separation method

Supervised NMF [Smaragdis, et al., 2007]

Separation process Optimize

Training process

Supervised basis matrix (spectral dictionary)

Sample sounds of target signal

Fixed

Sample sound

Target signal Other signalMixed signal

Page 12: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

12

Hybrid method [Kitamura, et al., 2013]

• We have proposed a new SNMF called SNMF with spectrogram restoration and its hybrid method.

Directional clustering

L R

Spatialseparation

Spectralseparation

SNMF with spectrogram restoration

Hybrid method

Page 13: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

13

SNMF with spectrogram restoration• SNMF with spectrogram restoration can separate the

target and restore the spectrogram simultaneously.

: Hole

Time

Fre

que

ncy

Spectrogram after directional clustering

Time

Fre

que

ncy

After SNMF with spectrogram restoration

Non-target

Target

Non-target

Target

Supervised bases(Dictionary of the target)

Page 14: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

14

• The divergence is defined at all grids except for the holes by using the Binary mask matrix .

Decomposition model and cost function

Decomposition model:

Supervised bases (Fixed)

: Entries of matrices, , and , respectively

: Weighting parameters,: Binary complement, : Frobenius norm

Regularization term

Penalty term

Cost function:

: Binary masking matrix obtained from directional clustering

Page 15: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

15

Outline• 1. Research background• 2. Conventional methods

– Directional clustering– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Hybrid method

• 3. Analysis of restoration ability– Generalized cost function– Analysis based on generation model

• 4. Experiments• 5. Conclusions

Page 16: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

16

• : -divergence [Eguchi, et al., 2001]

– EUC-distance

– KL-divergence

– IS-divergence

Generalized divergence: b -divergence

Page 17: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

17

• We introduced -divergence to extend the cost function as a generalized form.

Decomposition model and cost function

Decomposition model:

Supervised bases (Fixed)Cost function:

Page 18: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

18

Update rules• We can obtain the update rules for the optimization of

the variables matrices , , and .

Update rules:

Page 19: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

19

SNMF with spectrogram restoration• This SNMF has two tasks.

• The optimal divergence for source separation has been investigated.– KL-divergence ( ) is suitable for source separation.

• No one investigates about the optimal divergence for basis extrapolation.

• We analyze the optimal divergence for basis extrapolation based on a generation model in NMF.

Source separation

SNMF with spectrogram restoration

Basis extrapolation

Page 20: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

20

• The decomposition of NMF is equivalent to a maximum likelihood estimation, which assumes the generation model of the input data , implicitly.

Analysis of extrapolation ability

Cost function in NMF:

Exponential dist. Poisson dist. Gaussian dist.

: Maximum of data

IS-divergence KL-divergence EUC-distance

Page 21: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

21

                                                                              

• To compare net extrapolation ability, we generate a random data , which obey each generation model.

• Also, we prepare the binary-masked random data , and attempt to restore that.

Analysis of extrapolation ability

Restoration

100 bases is created.

                                                                              

Training

Page 22: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

22

• Binary mask was randomly generated.– We generate two types of binary mask whose densities of

holes are 75% and 98%.

• SAR indicates the accuracy of restoration

Analysis of extrapolation ability

Input random data Binary-masked data Restored data

Binary masking

Restoration

[dB]

Entry-wise square

                                                                              

                                                                              

                                                                              

Page 23: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

23

Results of restoration analysis• Simulated result of the restoration ability

• The optimal divergence for the basis extrapolation (restoration) is around !

25

20

15

10

5

0

SA

R [

dB

]

43210NMF

25

20

15

10

5

0

SA

R [

dB

]

43210NMF

breg= 0breg= 1breg= 2breg= 3

breg= 0breg= 1breg= 2breg= 3

Optimal divergence for source separation (KL-divergence)

Good

Bad

75%-binary-masked 98%-binary-masked

Page 24: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

24

Trade-off between separation and restoration

• The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities.

-10-8-6-4-20

Am

plitu

de [d

B]

543210Frequency [kHz]

-10-8-6-4-20

Am

plitu

de [d

B]

543210Frequency [kHz]

Sparseness: strong Sparseness: weak

Per

form

ance

Separation

Total performance of the hybrid method

Restoration

0 1 2 3 4

Page 25: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

25

Outline• 1. Research background• 2. Conventional methods

– Directional clustering– Nonnegative matrix factorization– Supervised nonnegative matrix factorization– Hybrid method

• 3. Analysis of restoration ability– Generalized cost function– Analysis based on generation model

• 4. Experiments• 5. Conclusions

Page 26: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

26

• Mixed signal includes four melodies (sources).• Three compositions of instruments

– We evaluated the average score of 36 patterns.

Experimental condition

Center

12 3

Left Right

Target source

Supervision signal

24 notes that cover all the notes in the target melody

Dataset Melody 1 Melody 2 Midrange BassNo. 1 Oboe Flute Piano TromboneNo. 2 Trumpet Violin Harpsichord FagottoNo. 3 Horn Clarinet Piano Cello

Page 27: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

27

14

12

10

8

6

4

2

0

SD

R [

dB]

43210NMF

• Signal-to-distortion ratio (SDR)– total quality of the separation, which includes the degree of

separation and absence of artificial distortion.

Experimental result

Good

Bad

Conventional SNMF

Proposed hybrid method ( )

Directional clustering

Multichannel NMF [Sawada]

KL-divergence EUC-distance

Unsupervised method

Supervised method

Multichannel NMF is an integrated method.

Page 28: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

28

Experiment for real-recorded signal• We recorded a binaural signal using dummy head• Reverberation time:

– 200 ms

• The other conditions are the same as those in the previous instantaneous mixture signal.

1

Center

Right

4

2 3

Left

Dummy head

1.5 m 1.5 m

1.5 m

2.5 m

Target signal

Page 29: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

29

14

12

10

8

6

4

2

0

SD

R [

dB]

43210NMF

• Result for real-recorded signals

Experimental result

Good

Bad

Conventional SNMF

Proposed hybrid method ( )

Unsupervised method

Supervised method

Directional clustering

Multichannel NMF [Sawada]

KL-divergence EUC-distanceMultichannel NMF is an integrated method.

Page 30: Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

30

Conclusions• Restoration requires anti-sparse criterion ( b = 3 )

• There is a trade-off between separation and restoration abilities

• Optimal divergence is EUC-distance for SNMF with spectrogram restoration– whereas KL-divergence is the best for conventional

SNMF.

Thank you for your attention!


Recommended