Identifying Repeated Patterns in Music Using Sparse...

Identifying Repeated Patterns in Music Using SparseConvolutive Non-Negative Matrix Factorization

ISMIR 2010

Ron Weiss Juan Bello{ronw,jpbello}@nyu.edu

Music and Audio Research LabNew York University

August 10, 2010

Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 1 / 17

Repetitive patterns in music

Repetition is ubiquitous is music

long-term verse-chorus structure

repeated motifs

Can we identify this structure directly from audio?

What about the repeated units?


Proposed approach

Treat song as concatenation of short, repeated template patterns

Inspired by source separation / text topic modeling

Convolutive Non-negative Matrix Factorization (NMF)


Beat-synchronous chroma features [Ellis and Poliner, 2007]

0 50 100 150 200 250Time (beats)

A

BC

D

EF

G

Day Tripper

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Summarize energy at each pitch class during each beat

Normalize frame energy to ignore dynamics


SI-PLCA [Smaragdis and Raj, 2007]

Shift-invariant Probabilistic Latent Component Analysisi.e. probabilistic convolutive NMF

V ≈∑k

Wk ∗ hk zk

Decompose matrix V into weighted (by Z ) sum of latent componentseach component is convolution of basis W with activations H

Short-term structure in W , long-term structure in HMust specify number, length of patternsIterative EM learning algorithm


Learning algorithm example – Initialization


Learning algorithm example – Converged

0 100 200 300 400 500 600 70002468

10

V (Iteration 199)

0 100 200 300 400 500 600 70002468

10Reconstruction

0 100 200 300 400 500 600 70002468

10Basis 0 reconstruction

0 100 200 300 400 500 600 70002468


0 100 200 300 400 500 600 70002468


0 100 200 300 400 500 600 70002468


0 1 2 30.000.050.100.150.200.250.30

Z

0 10 20 30

W0

0 10 20 30

W1

0 10 20 30

W2

0 10 20 30

W3

0 100 200 300 400 500 600 700

∗

H0

0 100 200 300 400 500 600 700

∗

H1

0 100 200 300 400 500 600 700

∗

H2

0 100 200 300 400 500 600 700

∗

H3


Sparsity

Encourage sparse (mostly zero) parameters using prior distributions

Use entropic prior over activations H [Smaragdis et al., 2008]

low entropy =⇒ less uniform

Leads to more meaningful patterns

but reduces temporal information in activationssparse H =⇒ dense W


Automatic relevance determination [Tan and Fevotte, 2009]

Avoid having to specify number of patterns in advance

Initialize decomposition with large number of patternsSparse Dirichlet distribution over mixing weights ZDiscard unused patterns

0 50 100 150 200Iteration

02468

10121416

Effe

ctiv

e ra

nk (K

)


Sparse learning example – Initialization

0 100 200 300 400 500 600 7000246810

V (Iteration 0)

0 100 200 300 400 500 600 7000246810

Reconstruction

0 100 200 300 400 500 600 7000246810

Basis 0 reconstruction

0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 100 200 300 400 500 600 7000246810


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150.000.010.020.030.040.050.060.07

Z

0 10 20 30

W0

0 10 20 30

W1

0 10 20 30

W2

0 10 20 30

W3

0 10 20 30

W4

0 10 20 30

W5

0 10 20 30

W6

0 10 20 30

W7

0 10 20 30

W8

0 10 20 30

W9

0 10 20 30

W10

0 10 20 30

W11

0 10 20 30

W12

0 10 20 30

W13

0 10 20 30

W14

0 10 20 30

W15

0 100 200 300 400 500 600 700

∗

H0

0 100 200 300 400 500 600 700

∗

H1

0 100 200 300 400 500 600 700

∗

H2

0 100 200 300 400 500 600 700

∗

H3

0 100 200 300 400 500 600 700

∗

H4

0 100 200 300 400 500 600 700

∗

H5

0 100 200 300 400 500 600 700

∗

H6

0 100 200 300 400 500 600 700

∗

H7

0 100 200 300 400 500 600 700

∗

H8

0 100 200 300 400 500 600 700

∗

H9

0 100 200 300 400 500 600 700

∗

H10

0 100 200 300 400 500 600 700

∗

H11

0 100 200 300 400 500 600 700

∗

H12

0 100 200 300 400 500 600 700∗

H13

0 100 200 300 400 500 600 700∗

H14

0 100 200 300 400 500 600 700

∗H15


Sparse learning example – Converged

0 100 200 300 400 500 600 70002468

10

V (Iteration 199)

0 100 200 300 400 500 600 70002468

10Reconstruction

0 100 200 300 400 500 600 70002468


0 100 200 300 400 500 600 70002468


0 100 200 300 400 500 600 70002468


0 100 200 300 400 500 600 70002468


0 1 2 30.000.050.100.150.200.250.300.350.400.45

Z

0 10 20 30

W0

0 10 20 30

W1

0 10 20 30

W2

0 10 20 30

W3

0 100 200 300 400 500 600 700

∗

H0

0 100 200 300 400 500 600 700

∗

H1

0 100 200 300 400 500 600 700

∗

H2

0 100 200 300 400 500 600 700

∗

H3


Applications: Riff identification / Thumbnailing

Reconstruct song using a single pattern

Sparse activationsRiff length known in advance (for now)Thumbnail corresponds to largest activation in H

0 2 4 6 8 10 12 14Time (beats)

A

BC

D

EF

G

0.0000.0030.0060.0090.0120.0150.0180.0210.024

0 100 200 300 400 500 600 700 800Time (beats)

0.000

0.005

0.010

0.015

0.020

0.025

0 2 4 6 8 10 12 14Time (beats)

A

BC

D

EF

G

0.0000.0020.0040.0060.0080.0100.0120.0140.016

0 200 400 600 800 1000Time (beats)

0.0000.0020.0040.0060.0080.0100.0120.0140.016


Applications: Structure segmentation

Identify long-term song structure (verse, chorus, bridge, etc.)

Assume one-to-one mapping between chroma patterns and segments

Use SI-PLCA decomposition with longer patterns

no prior on activations


Structure segmentation example

Estimated intro refrain verse refrain verse refrain verse refrain refrain outro.. .. .. .. .. .. .. .. .. ..

Ground truth intro refrain verse refrain vs/break refrain verse refrain refrain outro.. .. .. .. .. .. .. .. .. ..


Structure segmentation example 2

segments tend to be broken into multiple motifs

Est verse1 verse2 verse1 verse2 refrain. verse1 verse2 refrain. verse1 outro. verse1 refrain. verse1 outro.. .. .. .. .. .. .. .. .. .. .. .. .. ..

GT verse verse refrain. verse refrain. 12

verse inst. 12

verse refrain. outro

.. .. .. .. .. .. .. .. .. .. ..


Experiments

Evaluate on 180 songs from The Beatles catalog

System f-meas prec recall over-seg under-seg

[Mauch et al., 2009] 0.66 0.61 0.77 0.76 0.64SI-PLCA (sparse Z) 0.60 0.58 0.68 0.61 0.56SI-PLCA (rank=4) 0.58 0.60 0.59 0.56 0.59[Levy and Sandler, 2008] 0.54 0.58 0.53 0.50 0.57Random 0.30 0.36 0.26 0.07 0.24

Compare to systems based on self-similarity and HMM clustering

middle of the pack performancesparse Z gives ∼ 10% improvement in recall over fixed rank

Needs better post-processing?


Summary

Novel algorithm for identifying repeated harmonic patterns in music

Use sparsity to minimize number of fixed parameters, control structure

Applications to thumbnailing and structure segmentation

Future work

Adaptive model of pattern length, better downbeat alignment2D convolution to compensate for key changesTime-warp invariance (beat-tracking errors, fixed hop size)

Open source Python/Matlab implementation available:http://ronw.github.com/siplca-segmentation


http://ronw.github.com/siplca-segmentation

References

Ellis, D. and Poliner, G. (2007).

Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking.In Proc. ICASSP, pages IV–1429–1432.

Levy, M. and Sandler, M. (2008).

Structural Segmentation of Musical Audio by Constrained Clustering.IEEE Trans. Audio, Speech, and Language Processing, 16(2).

Mauch, M., Noland, K. C., and Dixon, S. (2009).

Using musical structure to enhance automatic chord transcription.In Proc. ISMIR, pages 231–236.

Smaragdis, P. and Raj, B. (2007).

Shift-Invariant Probabilistic Latent Component Analysis.Technical Report TR2007-009, MERL.

Smaragdis, P., Raj, B., and Shashanka, M. (2008).

Sparse and shift-invariant feature extraction from non-negative data.In Proc. ICASSP, pages 2069–2072.

Tan, V. and Fevotte, C. (2009).

Automatic Relevance Determination in Nonnegative Matrix Factorization.In Proc. SPARS.


Date post:	10-Mar-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Identifying Repeated Patterns in Music Using Sparse...

Documents