Identifying Repeated Patterns in Music Using SparseConvolutive Non-Negative Matrix Factorization
ISMIR 2010
Ron Weiss Juan Bello{ronw,jpbello}@nyu.edu
Music and Audio Research LabNew York University
August 10, 2010
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 1 / 17
Repetitive patterns in music
Repetition is ubiquitous is music
long-term verse-chorus structure
repeated motifs
Can we identify this structure directly from audio?
What about the repeated units?
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 2 / 17
Proposed approach
Treat song as concatenation of short, repeated template patterns
Inspired by source separation / text topic modeling
Convolutive Non-negative Matrix Factorization (NMF)
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 3 / 17
Beat-synchronous chroma features [Ellis and Poliner, 2007]
0 50 100 150 200 250Time (beats)
A
BC
D
EF
G
Day Tripper
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Summarize energy at each pitch class during each beat
Normalize frame energy to ignore dynamics
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 4 / 17
SI-PLCA [Smaragdis and Raj, 2007]
Shift-invariant Probabilistic Latent Component Analysisi.e. probabilistic convolutive NMF
V ≈∑k
Wk ∗ hk zk
Decompose matrix V into weighted (by Z ) sum of latent componentseach component is convolution of basis W with activations H
Short-term structure in W , long-term structure in HMust specify number, length of patternsIterative EM learning algorithm
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 5 / 17
Learning algorithm example – Initialization
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 6 / 17
Learning algorithm example – Converged
0 100 200 300 400 500 600 70002468
10
V (Iteration 199)
0 100 200 300 400 500 600 70002468
10Reconstruction
0 100 200 300 400 500 600 70002468
10Basis 0 reconstruction
0 100 200 300 400 500 600 70002468
10Basis 1 reconstruction
0 100 200 300 400 500 600 70002468
10Basis 2 reconstruction
0 100 200 300 400 500 600 70002468
10Basis 3 reconstruction
0 1 2 30.000.050.100.150.200.250.30
Z
0 10 20 30
W0
0 10 20 30
W1
0 10 20 30
W2
0 10 20 30
W3
0 100 200 300 400 500 600 700
∗
H0
0 100 200 300 400 500 600 700
∗
H1
0 100 200 300 400 500 600 700
∗
H2
0 100 200 300 400 500 600 700
∗
H3
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 7 / 17
Sparsity
Encourage sparse (mostly zero) parameters using prior distributions
Use entropic prior over activations H [Smaragdis et al., 2008]
low entropy =⇒ less uniform
Leads to more meaningful patterns
but reduces temporal information in activationssparse H =⇒ dense W
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 8 / 17
Automatic relevance determination [Tan and Fevotte, 2009]
Avoid having to specify number of patterns in advance
Initialize decomposition with large number of patternsSparse Dirichlet distribution over mixing weights ZDiscard unused patterns
0 50 100 150 200Iteration
02468
10121416
Effe
ctiv
e ra
nk (K
)
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 9 / 17
Sparse learning example – Initialization
0 100 200 300 400 500 600 7000246810
V (Iteration 0)
0 100 200 300 400 500 600 7000246810
Reconstruction
0 100 200 300 400 500 600 7000246810
Basis 0 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 1 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 2 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 3 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 4 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 5 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 6 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 7 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 8 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 9 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 10 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 11 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 12 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 13 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 14 reconstruction
0 100 200 300 400 500 600 7000246810
Basis 15 reconstruction
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150.000.010.020.030.040.050.060.07
Z
0 10 20 30
W0
0 10 20 30
W1
0 10 20 30
W2
0 10 20 30
W3
0 10 20 30
W4
0 10 20 30
W5
0 10 20 30
W6
0 10 20 30
W7
0 10 20 30
W8
0 10 20 30
W9
0 10 20 30
W10
0 10 20 30
W11
0 10 20 30
W12
0 10 20 30
W13
0 10 20 30
W14
0 10 20 30
W15
0 100 200 300 400 500 600 700
∗
H0
0 100 200 300 400 500 600 700
∗
H1
0 100 200 300 400 500 600 700
∗
H2
0 100 200 300 400 500 600 700
∗
H3
0 100 200 300 400 500 600 700
∗
H4
0 100 200 300 400 500 600 700
∗
H5
0 100 200 300 400 500 600 700
∗
H6
0 100 200 300 400 500 600 700
∗
H7
0 100 200 300 400 500 600 700
∗
H8
0 100 200 300 400 500 600 700
∗
H9
0 100 200 300 400 500 600 700
∗
H10
0 100 200 300 400 500 600 700
∗
H11
0 100 200 300 400 500 600 700
∗
H12
0 100 200 300 400 500 600 700∗
H13
0 100 200 300 400 500 600 700∗
H14
0 100 200 300 400 500 600 700
∗H15
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 10 / 17
Sparse learning example – Converged
0 100 200 300 400 500 600 70002468
10
V (Iteration 199)
0 100 200 300 400 500 600 70002468
10Reconstruction
0 100 200 300 400 500 600 70002468
10Basis 0 reconstruction
0 100 200 300 400 500 600 70002468
10Basis 1 reconstruction
0 100 200 300 400 500 600 70002468
10Basis 2 reconstruction
0 100 200 300 400 500 600 70002468
10Basis 3 reconstruction
0 1 2 30.000.050.100.150.200.250.300.350.400.45
Z
0 10 20 30
W0
0 10 20 30
W1
0 10 20 30
W2
0 10 20 30
W3
0 100 200 300 400 500 600 700
∗
H0
0 100 200 300 400 500 600 700
∗
H1
0 100 200 300 400 500 600 700
∗
H2
0 100 200 300 400 500 600 700
∗
H3
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 11 / 17
Applications: Riff identification / Thumbnailing
Reconstruct song using a single pattern
Sparse activationsRiff length known in advance (for now)Thumbnail corresponds to largest activation in H
0 2 4 6 8 10 12 14Time (beats)
A
BC
D
EF
G
0.0000.0030.0060.0090.0120.0150.0180.0210.024
0 100 200 300 400 500 600 700 800Time (beats)
0.000
0.005
0.010
0.015
0.020
0.025
0 2 4 6 8 10 12 14Time (beats)
A
BC
D
EF
G
0.0000.0020.0040.0060.0080.0100.0120.0140.016
0 200 400 600 800 1000Time (beats)
0.0000.0020.0040.0060.0080.0100.0120.0140.016
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 12 / 17
Applications: Structure segmentation
Identify long-term song structure (verse, chorus, bridge, etc.)
Assume one-to-one mapping between chroma patterns and segments
Use SI-PLCA decomposition with longer patterns
no prior on activations
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 13 / 17
Structure segmentation example
Estimated intro refrain verse refrain verse refrain verse refrain refrain outro.. .. .. .. .. .. .. .. .. ..
Ground truth intro refrain verse refrain vs/break refrain verse refrain refrain outro.. .. .. .. .. .. .. .. .. ..
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 14 / 17
Structure segmentation example 2
segments tend to be broken into multiple motifs
Est verse1 verse2 verse1 verse2 refrain. verse1 verse2 refrain. verse1 outro. verse1 refrain. verse1 outro.. .. .. .. .. .. .. .. .. .. .. .. .. ..
GT verse verse refrain. verse refrain. 12
verse inst. 12
verse refrain. outro
.. .. .. .. .. .. .. .. .. .. ..
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 15 / 17
Experiments
Evaluate on 180 songs from The Beatles catalog
System f-meas prec recall over-seg under-seg
[Mauch et al., 2009] 0.66 0.61 0.77 0.76 0.64SI-PLCA (sparse Z) 0.60 0.58 0.68 0.61 0.56SI-PLCA (rank=4) 0.58 0.60 0.59 0.56 0.59[Levy and Sandler, 2008] 0.54 0.58 0.53 0.50 0.57Random 0.30 0.36 0.26 0.07 0.24
Compare to systems based on self-similarity and HMM clustering
middle of the pack performancesparse Z gives ∼ 10% improvement in recall over fixed rank
Needs better post-processing?
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 16 / 17
Summary
Novel algorithm for identifying repeated harmonic patterns in music
Use sparsity to minimize number of fixed parameters, control structure
Applications to thumbnailing and structure segmentation
Future work
Adaptive model of pattern length, better downbeat alignment2D convolution to compensate for key changesTime-warp invariance (beat-tracking errors, fixed hop size)
Open source Python/Matlab implementation available:http://ronw.github.com/siplca-segmentation
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 17 / 17
References
Ellis, D. and Poliner, G. (2007).
Identifying ‘cover songs’ with chroma features and dynamic programming beat tracking.In Proc. ICASSP, pages IV–1429–1432.
Levy, M. and Sandler, M. (2008).
Structural Segmentation of Musical Audio by Constrained Clustering.IEEE Trans. Audio, Speech, and Language Processing, 16(2).
Mauch, M., Noland, K. C., and Dixon, S. (2009).
Using musical structure to enhance automatic chord transcription.In Proc. ISMIR, pages 231–236.
Smaragdis, P. and Raj, B. (2007).
Shift-Invariant Probabilistic Latent Component Analysis.Technical Report TR2007-009, MERL.
Smaragdis, P., Raj, B., and Shashanka, M. (2008).
Sparse and shift-invariant feature extraction from non-negative data.In Proc. ICASSP, pages 2069–2072.
Tan, V. and Fevotte, C. (2009).
Automatic Relevance Determination in Nonnegative Matrix Factorization.In Proc. SPARS.
Ron Weiss, Juan Bello (MARL, NYU) Identifying Repeated Patterns in Music Using Sparse Convolutive Non-Negative Matrix FactorizationAugust 10, 2010 17 / 17