Date post: | 22-Jan-2018 |
Category: |
Education |
Upload: | sankalp-gulati |
View: | 67 times |
Download: | 1 times |
Computational Approaches to Melodic Analysis of Indian Art Music
Indian Institute of Sciences, Bengaluru, India 2016
Sankalp Gulati Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Tonic Identification
time (s)
Fre
qu
ency
(H
z)
0 1 2 3 4 5 6 7 80
1000
2000
3000
4000
5000
100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Frequency (bins), 1bin=10 cents, Ref=55 Hz
Nor
mal
ized
sal
ienc
e
f2
f3
f4
f5f
6
Tonic
Signal processing Learning
q Tanpura / drone background sound q Extent of gamakas on Sa and Pa svara q Vadi, sam-vadi svara of the rāga
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):55–73, 2014.
Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal.
Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd CompMusic Workshop (pp. 113–118) Istanbul, Turkey.
Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY.
Accuracy : ~90% !!!
q Pitch (Fundamental frequency-F0) of the lead artist
q Pitch estimation § Melodic contour characteristics § Dual melodic lines in Indian art music
Signal processing Learning
Salamon, Justin, and Emilia Gómez. "Melody extraction from polyphonic music signals using pitch contour characteristics." Audio, Speech, and Language Processing, IEEE Transactions on 20.6 (2012): 1759-1770.
Rao, Vishweshwara, and Preeti Rao. "Vocal melody extraction in the presence of pitched accompaniment in polyphonic music." Audio, Speech, and Language Processing, IEEE Transactions on 18.8 (2010): 2145-2154.
De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111, 1917.
Melody Histogram Computation
50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Frequency (bins), 1 bin = 10 Cents, Ref = 55 Hz
Norm
aliz
ed s
alie
nce
Lower Sa
Tonicmiddle Sa
Higher Sa
Frequency (cents), fref = tonic frequency
0 120 -120
time (s) 10 30
Intonation Analysis
Mohana - G Begada - G
• Koduri, Gopala Krishna, et al. "Intonation Analysis of Rāgas in Carnatic Music." Journal of New Music Research 43.1 (2014): 72-93.
• Koduri, Gopala K., Serrà Joan, and Xavier Serra. "Characterization of Intonation in Carnatic Music by Parametrizing Pitch Histograms." (2012): 199-204.
Approaches for Discovery of Motifs
Melodic Motives
Discovery Induction Extraction
Matching Retrieval
+
Imagetakenfrom-(Mueen&Keogh,2009)
Approaches for Discovery of Motifs
Melodic Motives
Discovery Induction Extraction
Matching Retrieval
+
Imagetakenfrom-(Mueen&Keogh,2009)
Melodic Pattern Discovery: Challenges q Pitch variation q Timing variation q Added ornamentation
Time (s)
Freq
uenc
y (C
ent)
1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0
200
400
600
800
1000
1200
1400
Melodic Pattern Discovery
-Predominant pitch estimation
-Downsampling -Hz to Cents -Tonic normalization -Brute-force segmentation
-Segment filtering
-Uniform Time-scaling
Flat Non-flat
Data processing Intra-recording discovery
Inter-recording search
Rank-refinement
q S. Gulati, J. Serrà, V. Ishwar, and X. Serra, “Mining melodic patterns in large audio collections of Indian art music,” in Int. Conf. on Signal Image Technology & Internet Based Systems - MIRA, Marrakesh, Morocco, 2014, pp. 264–271.
Data Preprocessing
-Predominant pitch estimation
-Downsampling -Hz to Cents -Tonic normalization -Brute-force segmentation
-Segment filtering
-Uniform Time-scaling
Flat Non-flat
q S. Gulati, J. Serrà, V. Ishwar, and X. Serra, “Mining melodic patterns in large audio collections of Indian art music,” in Int. Conf. on Signal Image Technology & Internet Based Systems - MIRA, Marrakesh, Morocco, 2014, pp. 264–271.
Melodic Similarity
q S. Gulati, J. Serrà and X. Serra, "An Evaluation of Methodologies for Melodic Similarity in Audio Recordings of Indian Art Music", in Proceedings of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia 2015
Melodic Similarity: Distance Measure q Dynamic time warping based distance
Discovery and Search Rank refinement
Computational Complexity q Lower bounding techniques (DTW)
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., ... & Keogh, E. (2012, August). Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 262-270). ACM.
Image taken from: http://www.cs.ucr.edu/~eamonn/LB_Keogh.htm
Melodic Similarity Improvements
q S. Gulati, J. Serrà and X. Serra, "Improving Melodic Similarity in Indian Art Music Using Culture-specific Melodic Characteristics", in International Society for Music Information Retrieval Conference (ISMIR) , pp. 680-686, Spain, 2015
Melodic Pattern Network
q M. EJ Newman, “The structure and function of complex networks,” Society for Industrial and Applied Mathematics (SIAM) review, vol. 45, no. 2, pp. 167–256, 2003.
Undirectional
Similarity Threshold Estimation
q M. EJ Newman, “The structure and function of complex networks,” Society for Industrial and Applied Mathematics (SIAM) review, vol. 45, no. 2, pp. 167–256, 2003.
q S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science, vol. 296, no. 5569, pp. 910– 913, 2002.
Ts*
Melodic Pattern Characterization
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.
Melodic Pattern Characterization
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.
Melodic Pattern Characterization
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.
Melodic Pattern Characterization
Gamaka
Rāgaphrase
ComposiDonphrase
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.
Melodic Pattern Characterization
q S. Gulati, J. Serrà, V. Ishwar, S. Şentürk and X. Serra, "Discovering Rāga Motifs by characterizing Communities in Networks of Melodic Patterns", in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pp. 286-290, Shanghai, China, 2016.
Automatic Rāga Recognition
Training corpus
Rāga recognition System
Rāga label
Yaman Shankarabharnam
Todī Darbari
Kalyan Bageśrī Kambhojī Hamsadhwani
Des Harikambhoji
Kirvani Atana
Behag Kapi Begada
Rāga Characterization: Svaras
50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Frequency (bins), 1 bin = 10 Cents, Ref = 55 Hz
Norm
aliz
ed s
alie
nce
Lower Sa
Tonicmiddle Sa
Higher Sa
Frequency (cents), fref = tonic frequency
0 120 -120
q P. Chordia and S. Şentürk, “Joint recognition of raag and tonic in North Indian music,” Computer Music Journal, vol. 37, no. 3, pp. 82–98, 2013.
q G. K. Koduri, S. Gulati, P. Rao, and X. Serra, “Rāga recognition based on pitch distribution methods,” Journal of New Music Research, vol. 41, no. 4, pp. 337–350, 2012.
time (s) 10 30
Rāga Characterization: Intonation
50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Frequency (bins), 1 bin = 10 Cents, Ref = 55 Hz
Norm
aliz
ed s
alie
nce
Lower Sa
Tonicmiddle Sa
Higher Sa
Frequency (cents), fref = tonic frequency
0 120 -120
time (s) 10 30
Rāga Characterization: Intonation
50 100 150 200 250 3000
0.2
0.4
0.6
0.8
1
Frequency (bins), 1 bin = 10 Cents, Ref = 55 Hz
Norm
aliz
ed s
alie
nce
Lower Sa
Tonicmiddle Sa
Higher Sa
Frequency (cents), fref = tonic frequency
0 120 -120
q G.K.Koduri,V.Ishwar,J.Serrà,andX.Serra,“Intonation analysis of rāgas in Carnatic music,” Journal of New Music Research, vol. 43, no. 1, pp. 72–93, 2014.
q H. G. Ranjani, S. Arthi, and T. V. Sreenivas, “Carnatic music analysis: Shadja, swara identification and raga verification in alapana using stochastic models,” in IEEE WASPAA, 2011, pp. 29–32.
time (s) 10 30
Rāga Characterization: Ārōh-Avrōh
q Ascending-descending svara pattern; melodic progression
time (s) 10 30
time (s) 10 30
Rāga Characterization: Ārōh-Avrōh
q S. Shetty and K. K. Achary, “Raga mining of indian music by extracting arohana-avarohana pattern,” Int. Journal of Recent Trends in Engineering, vol. 1, no. 1, pp. 362–366, 2009.
q V. Kumar, H Pandya, and C. V. Jawahar, “Identifying ragas in indian music,” in 22nd Int. Conf. on Pattern Recognition (ICPR), 2014, pp. 767–772.
q P. V. Rajkumar, K. P. Saishankar, and M. John, “Identification of Carnatic raagas using hidden markov models,” in IEEE 9th Int. Symposium on Applied Machine Intelligence and Informatics (SAMI), 2011, pp. 107–110.
Melodic Progression Templates N-gram Distribution Hidden Markov Model
time (s) 10 30
Rāga Characterization: Melodic motifs
q R. Sridhar and T. V. Geetha, “Raga identification of carnatic music for music information retrieval,” International Journal of Recent Trends in Engineering, vol. 1, no. 1, pp. 571–574, 2009.
q S. Dutta, S. PV Krishnaraj, and H. A. Murthy, “Raga verification in carnatic music using longest common segment set,” in Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 605-611,2015
Rāga A Rāga B Rāga C
Goal q Automatic rāga recognition
Training corpus
Rāga recognition System
Rāga label
Yaman Shankarabharnam
Todī Darbari
Kalyan Bageśrī Kambhojī Hamsadhwani
Des Harikambhoji
Kirvani Atana
Behag Kapi Begada
Goal q Automatic rāga recognition
Training corpus
Rāga recognition System
Rāga label
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
time (s) 10 30
Yaman Shankarabharnam
Todī Darbari
Kalyan Bageśrī Kambhojī Hamsadhwani
Des Harikambhoji
Kirvani Atana
Behag Kapi Begada
time (s) 10 30
time (s) 10 30
time (s) 10 30
Classification methodology q Experimental setup
§ Stratified 12-fold cross validation (balanced) § Repeat experiment 20 times § Evaluation measure: mean classification accuracy
q Classifiers § Multinomial, Gaussian and Bernoulli naive Bayes
(NBM, NBG and NBB) § SVM with a linear and RBF-kernel, and with a
SGD learning (SVML, SVMR and SGD) § logistic regression (LR) and random forest (RF)
Results where, f(p, r) denotes the raw frequency of occurrence of phrasep in recording r. F1 only considers the presence or absence of aphrase in a recording. In order to investigate if the frequency of oc-currence of melodic phrases is relevant for characterizing ragas, wetake F2(p, r) = f(p, r). As mentioned, the melodic phrases thatoccur across different ragas and in several recordings are futile forraga recognition. Therefore, to reduce their effect in the feature vec-tor we employ a weighting scheme, similar to the inverse documentfrequency (idf) weighting in text retrieval.
F3(p, r) = f(p, r)⇥ irf(p,R) (2)
irf(p,R) = log
✓N
|{r 2 R : p 2 r}|
◆(3)
where, |{r 2 R : p 2 r}| is the number of recordings where themelodic phrase p is present, that is f(p, r) 6= 0 for these recordings.We denote our proposed method by M
3. EVALUATION
3.1. Music Collection
The music collection used in this study is compiled as a part of theCompMusic project [26–28]. The collection comprises 124 hours ofcommercially available audio recordings of Carnatic music belong-ing to 40 ragas. For each raga, there are 12 music pieces, whichamounts to a total of 480 recordings. All the editorial metadata foreach audio recording is publicly available in Musicbrainz3, an open-source metadata repository. The music collection primarily consistsof vocal performances of 62 different artists. There are a total of 310different compositions belonging to diverse forms in Carnatic music(for example kirtana, varnam, virtuttam). The chosen ragas containdiverse set of svaras (note), both in terms of the number of svaras andtheir pitch-classes (svarasthanas). To facilitate comparative studiesand promote reproducible research we make this music collectionpublicly available online4.
From this music collection we build two datasets, which we de-note by DB40raga and DB10raga. DB40raga comprises the entiremusic collection and DB10raga comprises a subset of 10 ragas. Weuse DB10raga to make our results more comparable to studies wherethe evaluations are performed on similar number of ragas.
3.2. Classification and Evaluation Methodology
The features obtained above are used to train a classifier. In orderto assess the relevance of these features for raga recognition, we ex-periment with different algorithms exploiting diverse classificationstrategies [29]: Multinomial, Gaussian and Bernoulli naive Bayes(NBM, NBG and NBB, respectively), support vector machines witha linear and a radial basis function kernel, and with a stochastic gra-dient descent learning (SVML, SVMR and SGD, respectively), lo-gistic regression (LR) and random forest (RF). We use the imple-mentation of these classifiers available in scikit-learn toolkit [30],version 0.15.1. Since in this study, our focus is to extract a musicallyrelevant set of features based on melodic phrases, we use the defaultparameter settings for the classifiers available in scikit-learn.
We use a stratified 12-fold cross-validation methodology forevaluations. The folds are generated such that every fold comprisesequal number of feature instances per raga. We repeat the entireexperiment 20 times, and report the mean classification accuracy as
3https://musicbrainz.org4http://compmusic.upf.edu/node/278
db Mtd Ftr NBM NBB LR SVML 1NND
B10
raga M
F1 90.6 74 84.1 81.2 -F2 91.7 73.8 84.8 81.2 -F3 90.5 74.5 84.3 80.7 -
S1PCD120 - - - - 82.2PCDfull - - - - 89.5
S2 PDparam 37.9 11.2 70.1 65.7 -
DB
40ra
ga MF1 69.6 61.3 55.9 54.6 -F2 69.6 61.7 55.7 54.3 -F3 69.5 61.5 55.9 54.5 -
S1PCD120 - - - - 66.4PCDfull - - - - 74.1
S2 PDparam 20.8 2.6 51.4 44.2 -
Table 1. Accuracy (in percentage) of different methods (Mtd) fortwo datasets (db) using different classifiers and features (Ftr).
the evaluation measure. In order to assess if the difference in theperformance of any two methods is statistically significant, we usethe Mann-Whitney U test [31] with p = 0.01. In addition, to com-pensate for multiple comparisons, we apply the Holm-Bonferronimethod [32].
3.3. Comparison with the state of the art
We compare our results with two state of the art methods proposedin [7] and [12]. As an input to these methods, we use the samepredominant melody and tonic as used in our method. The methodin [7] uses smoothened pitch-class distribution (PCD) as the tonalfeature and employs 1-nearest neighbor classifier (1NN) using Bhat-tacharyya distance for predicting raga label. We denote this methodby S1. The authors in [7] report a window size of 120 s as an opti-mal duration for computing PCDs (denoted here by PCD120). How-ever, we also experiment with PCDs computed over the entire audiorecording (denoted here by PCDfull). Note that in [7] the authorsdo not experiment with a window size larger than 120 s.
The method proposed in [12] also uses features based on pitchdistribution. However, unlike in [7], the authors use parameterizedpitch distribution of individual svaras as features (denoted here byPDparam). We denote this method by S2. The authors of both thesepapers courteously ran the experiments on our dataset using the orig-inal implementations of the methods.
4. RESULTS AND DISCUSSION
In Table 1, we present the results of our proposed method M and thetwo state of the art methods S1 and S2 for the two datasets DB10ragaand DB40raga. The highest accuracy for every method is highlightedin bold for both the datasets. Due to lack of space we present resultsonly for the best performing classifiers.
We start by analyzing the results of the variants of M . From Ta-ble 1, we see that the highest accuracy obtained by M for DB10ragais 91.7%. Compared to DB10raga, there is a significant drop in theperformance of every variant of M for DB40raga. The best perform-ing variant in the latter achieves 69.6% accuracy. We also see thatfor both the datasets, the accuracy obtained by M across the featuresets is nearly the same for each classifier, with no statistically sig-nificant difference. This suggests that, considering just the presenceor the absence of a melodic phrase, irrespective of its frequency ofoccurrence, is sufficient for raga recognition. Interestingly, this find-ing is consistent with the fact that characteristic melodic phrases are
S.GulaD,J.Serrà,V.IshwarandX.Serra,"Phrase-basedRāgaRecogniDonUsingVectorSpaceModelling",inIEEEInt.Conf.onAcousDcs,Speech,andSignalProcessing(ICASSP),pp.66-70,Shanghai,China,2016.
Resources q Tonic dataset:
§ http://compmusic.upf.edu/iam-tonic-dataset
q Rāga dataset: § http://compmusic.upf.edu/node/278
q Demo: § http://dunya.compmusic.upf.edu/motifdiscovery/ § http://dunya.compmusic.upf.edu/pattern_network/
q CompMusic (Project): § http://compmusic.upf.edu/
q Related datasets: § http://compmusic.upf.edu/datasets