+ All Categories
Home > Technology > Timbral modeling for music artist recognition using i-vectors

Timbral modeling for music artist recognition using i-vectors

Date post: 14-Apr-2017
Category:
Upload: hamid-eghbal-zadeh
View: 268 times
Download: 0 times
Share this document with a friend
23
TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS Hamid Eghbal-zadeh, Markus Schedl, Gerhard Widmer Johannes Kepler University Linz, Austria 1 EUSIPCO 2015
Transcript
Page 1: Timbral modeling for music artist recognition using i-vectors

TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING

I-VECTORS

Hamid Eghbal-zadeh, Markus Schedl, Gerhard Widmer

Johannes Kepler University

Linz, Austria

1

EUSIPCO 2015

Page 2: Timbral modeling for music artist recognition using i-vectors

Overview

• Introduction o Artist recognition o I-vector based systems

• I-vector Frontend o Calculate statistics [GMM supervectors ] o Factor analysis [estimate hidden factors to extract I-vectors]

• Proposed method: o Normalization and compensation techniques o Backends

• Experiments o Setup o Evaluation o Baselines o Results

• Conclusion

2

Page 3: Timbral modeling for music artist recognition using i-vectors

Introduction – Artist recognition

• Artist recognition:

Recognizing the artist using a part of a song

Artist refers to the singer or the band of a song.

• Difficulties: – Musical instruments

– Effects of the genre and Instrumentation

– Singer’s voice + instruments

3

Major Lazer & DJ Snake - Lean On

Singing voice Music

Page 4: Timbral modeling for music artist recognition using i-vectors

Introduction – I-vector based systems

• I-vectors: – Introduced in speaker verification in 2010

– Provide a compact and low dimensional representation

• Also used for: – Emotion recognition ,Language recognition , Audio scene detection

• Use Factor Analysis: – Estimate hidden factors that can help us recognize an artist from a song

• Introducing Artist and Session factors in a song:

– Artist variability : the variability appears between songs of different artists.

– Session variability : the variability appears within songs of an artist.

4

Song Frame-level features

Son

g-le

vel f

eat

ure

s

Esti

mat

e h

idd

en f

acto

rs

Page 5: Timbral modeling for music artist recognition using i-vectors

I-vector Factor Analysis – Terminology

5

i-vector

GMM supervector

Frame-level feature

Step

3:

Fact

or

An

alys

is

Total Variability Space (TVS)

[~400]

GMM space [~20,000]

Frame-level feature space

[~20]

Total factors

hidden

hidden

Spaces Features Factors

Step

2:

Stat

isti

cs

calc

ula

tio

n

Step

1:

Feat

ure

ex

trac

tio

n

Artist variability : the variability appears between different artists.

Session variability : the variability appears within songs of an artist.

Total variability : Artist + Session variability

Page 6: Timbral modeling for music artist recognition using i-vectors

𝛄𝑡(𝑐)

𝑡

𝛄𝑡 𝑐

𝑡

∗ 𝑋𝑡

𝛄𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c BW: Baum-Welch

0th BW

1st BW

GMM-supervector*

I-vectors – Statistics calculation

6

UB

M

Unsupervised

UBM

Song 1

UBM

Song 2

UBM

.

.

.

Development db

Step 2: extract GMM supervectors

* Similar to: Charbuillet et al. , GMM-Supervector for Content based Music Similarity, DAFx 2011.

{MFCCs} {Songs}

Train/Test db

Page 7: Timbral modeling for music artist recognition using i-vectors

I-vectors - Factor analysis

7

Step 3: estimate hidden factors

Goal: • Reduce the dimensionality • Separate desired factors from undesired

factors in feature space • Estimate hidden variables related to

desired factors

M(s) M s = m+ 𝑂𝑠

UBM

Offset vector

Assumption:

GMM supervector For song s

Page 8: Timbral modeling for music artist recognition using i-vectors

I-vectors - Factor analysis

8

Step 3: estimate hidden factors - previous methods

Residual matrix

Session subspace matrix

M s = m+ 𝑉 ∗ 𝑦 + 𝑈 ∗ 𝑥 + 𝐷 ∗ 𝑧

Artist subspace matrix

Joint Factor Analysis (JFA) :

• JFA assumes 𝑂s consists of separated artist and session factors. • JFA showed better performance than previous FA methods

mean vector of UBM Residual term

GMM supervector For song s

Page 9: Timbral modeling for music artist recognition using i-vectors

I-vectors - Factor analysis

9

i-vector ~N(0,1)

M s = m+ 𝑇 ∗ 𝑦

TVS (low-rank) matrix

Step 3: estimate hidden factors - current method

• TVS: Contains both artist and session factors • T is initiated randomly and is learned using EM algorithm from

training data

I-vector extraction:

mean vector of UBM

GMM supervector For song s

Page 10: Timbral modeling for music artist recognition using i-vectors

I-vectors – Learning T

10

• E step: For each artist, use the current estimates of T to find the i-vector which maximizes the likelihood function of the GMM supervector of song s, 𝑀(𝑠)

y s = argmaxy𝑃(𝑀(𝑠) |𝑚 + 𝑇𝑦, Σ)

• M step: Update T by maximizing

𝑃(𝑀(𝑠) |𝑚 + 𝑇𝑦, Σ)

Step 3: estimate hidden factors - expectation maximization

Covariance matrix UBM mean vector

Page 11: Timbral modeling for music artist recognition using i-vectors

I-vectors – Proposed system

1. I-vectors are centered by removing the mean

2. I-vectors are length normalized

3. LDA is used for compensation and dimensionality reduction

11

𝑦𝑛 =𝑦

|𝑦|

i-vector Length-normalized i-vector

{I-vector extraction} {DA,3NN,NB,PLDA} {MFCC}

Extract features

Extract GMM

supervectors

Front end

Compensation/ Normalization

{LDA/Length norm}

Son

g

Backend

Page 12: Timbral modeling for music artist recognition using i-vectors

Backends

• Discriminant Analysis classifier

• Nearest neighbor classifier with cosine distance (k=3)

• Naïve Bayes classifier

• Probabilistic Linear Discriminant Analysis

12

𝑦 = 𝑚 + ɸ . 𝑙 + 𝑒

latent factor

Residual term i-vector

mean of training i-vectors

latent matrix

Page 13: Timbral modeling for music artist recognition using i-vectors

Experiments – Setup

• 30 seconds is randomly selected from the middle area of each song

• 13 and 20 dim MFCCs are used as frame-level features

• 1024 components GMM is trained as UBM

• TVS matrix is trained with 400 factors

• LDA is applied for compensation and dimensionality reduction

• Development db = Train set

13

Page 14: Timbral modeling for music artist recognition using i-vectors

Experiments – Evaluation

• “Artist20” dataset: 1413 tracks, mostly rock and pop, composed of six albums each from 20 artists

• 6-fold cross-validation provided in Artist20 dataset

• In each iteration, 1 album out of 6 albums from artist is kept out for test.

14

Page 15: Timbral modeling for music artist recognition using i-vectors

Experiments – Baselines

Best artist recognition performance found on Artist20 db:

1. Single GMM : [D. PW Ellis, 2007] – Provided with the dataset

2. Signature-based approach: [S. Shirali, 2009]

– Generates compact signatures and compares them using graph matching

3. Sparse modelling: [L. Su, 2013]

– Sparse feature learning method with a ‘bag of features’ using the magnitude and phase parts of the spectrum

4. Multivariate kernels: [P. Kuksa, 2014]

– Uses multivariate kernels with the direct uniform quantization

5. Alternative:

– Uses the same structure as proposed method, only i-vector extraction block is switched with PCA

15

{PCA} {DA} {MFCC}

Extract features

GMM supervecto

rs

Front end

Compensation/ Normalization

{LDA/Length norm}

Song Backend

Page 16: Timbral modeling for music artist recognition using i-vectors

I-vectors – Results

16

Best 13

Alt. 13

Best 20

Alt. 20

Page 17: Timbral modeling for music artist recognition using i-vectors

I-vectors – Results

• Results for different Gaussian numbers with the proposed method and the DA classifier

17

Best 13

Best 20

Page 18: Timbral modeling for music artist recognition using i-vectors

Conclusion

18

• Total factors can model an artist

• Compact representation, low dimensionality

• Song-level features

• Robust to multiple backends

Page 19: Timbral modeling for music artist recognition using i-vectors

Acknowledgement

19

• We would like to acknowledge the tremendous help by Dan Ellis of Columbia University who provided tools and resources for feature extraction and shared the details of his work, which enabled us to reproduce his experiment results

• Thanks also to Pavel Kuksa from University of Pennsylvania for sharing the details of his work with us.

• We appreciate helpful suggestions of Marko Tkalcic from Johannes Kepler University of Linz. • This work was supported by the EU-FP7 project no.601166 “Performances

as Highly Enriched aNd Interactive Concert eXperiences (PHENICX)”.

Page 20: Timbral modeling for music artist recognition using i-vectors

Questions

20

Thank you for your time!

Page 21: Timbral modeling for music artist recognition using i-vectors

𝛄𝑡(𝑐)

𝑡

𝛄𝑡 𝑐

𝑡

∗ 𝑋𝑡

𝛄𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c BW: Baum-Welch

0th BW

1st BW

GMM-supervector

I-vectors - GMM supervector

21

Example: UBM: 1024 components Feature: 20 dim 0th BW=1024 x 1 1st BW=20 x 1024

Step 1

Page 22: Timbral modeling for music artist recognition using i-vectors

I-vectors - Factor analysis

22

𝑦 = (𝐼 + 𝑇𝑡 Σ−1𝑁 𝑠 𝑇)−1. 𝑇−1Σ−1𝐹(𝑠)

0th BW 1st BW Covariance matrix of UBM

I−vector of song s:

Step 2: Closed form

𝛄𝑡(𝑐)

𝑡

𝛄𝑡 𝑐

𝑡

∗ 𝑋𝑡 𝛄𝑡 𝑐 : Posterior prob of 𝑋𝑡 by component c BW: Baum-Welch

𝑁 𝑠 ∶ 0th BW

𝐹 𝑠 ∶ 1st BW

TVS matrix Identity matrix

(GMM supervectors)

i-vector

Page 23: Timbral modeling for music artist recognition using i-vectors

I-vector Extraction Routine

– Step 1: Feature extraction

– Step 2: Statistics calculation • Extract GMM-supervectors from frame-level features (MFCCs)

– Step 3: Factor analysis • Apply factor analysis to estimate hidden variables in GMM space

23

{I-vector extraction} {PLDA,…} {MFCC}

Extract features

Extract GMM

supervectors

Front end

Compensation/ Normalization

{LDA/Length norm}

Fram

es

Backend


Recommended