Singer Identification
Bertrand SCHERRER
McGill University
March 15, 2007
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 2 / 27
Introduction
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 3 / 27
Introduction Applications
Singer Identification is to be (has been) applied on pop music mainly
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 4 / 27
Introduction Applications
Automatically label data for which no/or not much information isavailable⇒ recognize the singerDistinguish between original version of a song and cover songsCopyright enforcement: recording companies could scan bootlegsites on the internet to check if there are any unauthorizedrecorded versions of a concert [Kim, 2002 and Tsai and Wang,2006]Music recommendation systems could use singer identification togroup singers with same voice characteristics.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 5 / 27
Introduction Applications
Automatically label data for which no/or not much information isavailable⇒ recognize the singerDistinguish between original version of a song and cover songsCopyright enforcement: recording companies could scan bootlegsites on the internet to check if there are any unauthorizedrecorded versions of a concert [Kim, 2002 and Tsai and Wang,2006]Music recommendation systems could use singer identification togroup singers with same voice characteristics.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 5 / 27
Introduction Applications
Automatically label data for which no/or not much information isavailable⇒ recognize the singerDistinguish between original version of a song and cover songsCopyright enforcement: recording companies could scan bootlegsites on the internet to check if there are any unauthorizedrecorded versions of a concert [Kim, 2002 and Tsai and Wang,2006]Music recommendation systems could use singer identification togroup singers with same voice characteristics.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 5 / 27
Introduction Applications
Automatically label data for which no/or not much information isavailable⇒ recognize the singerDistinguish between original version of a song and cover songsCopyright enforcement: recording companies could scan bootlegsites on the internet to check if there are any unauthorizedrecorded versions of a concert [Kim, 2002 and Tsai and Wang,2006]Music recommendation systems could use singer identification togroup singers with same voice characteristics.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 5 / 27
Introduction Challenges
Singing Voice = hybrid btw speech and musical instrument⇒create specific methods of analysis.In pop music, voice is never heard alone: presence ofaccompaniement
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 6 / 27
Introduction Challenges
Singing Voice = hybrid btw speech and musical instrument⇒create specific methods of analysis.In pop music, voice is never heard alone: presence ofaccompaniement
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 6 / 27
Feature Extraction
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 7 / 27
Feature Extraction
As seen in the previous diagrams: need to extract some featuresfrom the sounds.Features used:
MFCC (Mel-Frequency Cepstral Coefficient)MDCT (Modified Discrete Cosine Transform)LPCC (Linear Predictive Coding Coefficients)WLPCC (Warped ...)Cepstral Coefficients of the LPC spectrumLPMFCC (MFCC of the LPC spectrum)
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 8 / 27
Feature Extraction
As seen in the previous diagrams: need to extract some featuresfrom the sounds.Features used:
MFCC (Mel-Frequency Cepstral Coefficient)MDCT (Modified Discrete Cosine Transform)LPCC (Linear Predictive Coding Coefficients)WLPCC (Warped ...)Cepstral Coefficients of the LPC spectrumLPMFCC (MFCC of the LPC spectrum)
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 8 / 27
Vocal/NonVocal Region Segmentation
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 9 / 27
Vocal/NonVocal Region Segmentation
Principle
Difference in spectrum between voiced regions andaccompaniement-only: hamonicity of the voice.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 10 / 27
Vocal/NonVocal Region Segmentation
Voice/Accompaniement Spectra
Fig.1 [Tsai and Wang, 2006]
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 11 / 27
Vocal/NonVocal Region Segmentation GMM-based methods
Tsai’s Approach
Fig.1 [Tsai, 2004]
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 12 / 27
Vocal/NonVocal Region Segmentation GMM-based methods
Tsai’s Approach
This method is supposed to yield 82.3% accuracy [Tsai andWang, 2006]
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 13 / 27
Vocal/NonVocal Region Segmentation GMM-based methods
Fujihara’s Approach
from Fig.1 [Fujihara 2005]
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 14 / 27
Vocal/NonVocal Region Segmentation GMM-based methods
The GMM classification between Vocal and Non Vocal is done onthe resynthesized signal.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 15 / 27
Classification
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 16 / 27
Classification
3 main strategies
GMMSVMk -NN
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 17 / 27
Classification GMM
GMM Method with Solo Voice Modeling
Fig.3 [Tsai and Wang, 2006]
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 18 / 27
Results
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 19 / 27
Results
Performance
Kim and Whitman 2002⇒ 45%Liu and Huang, 2002⇒ 80 %Tsai and Wang, 2006, Fujihara et al., 2005⇒ 95%
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 20 / 27
Conclusion
Outline
1 IntroductionApplicationsChallenges
2 Feature Extraction
3 Vocal/NonVocal Region SegmentationGMM-based methods
4 ClassificationGMM
5 Results
6 Conclusion
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 21 / 27
Conclusion
Good
Singer identification yields satisfactory results.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 22 / 27
Conclusion
But ...
Only one article tackles Target Singer Detection or Target SingerTracking: [Tsai and Wang 2006]. ⇒ results are not perfect for duetbut are better than doing GMM without solo modeling.Specific to pop music⇒ what happens with a cappela singers?Specific to on geographical area (Asia)⇒ important because ofvoice mix
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 23 / 27
Conclusion
But ...
Only one article tackles Target Singer Detection or Target SingerTracking: [Tsai and Wang 2006]. ⇒ results are not perfect for duetbut are better than doing GMM without solo modeling.Specific to pop music⇒ what happens with a cappela singers?Specific to on geographical area (Asia)⇒ important because ofvoice mix
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 23 / 27
Conclusion
But ...
Only one article tackles Target Singer Detection or Target SingerTracking: [Tsai and Wang 2006]. ⇒ results are not perfect for duetbut are better than doing GMM without solo modeling.Specific to pop music⇒ what happens with a cappela singers?Specific to on geographical area (Asia)⇒ important because ofvoice mix
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 23 / 27
Conclusion
But ...
Only one article tackles Target Singer Detection or Target SingerTracking: [Tsai and Wang 2006]. ⇒ results are not perfect for duetbut are better than doing GMM without solo modeling.Specific to pop music⇒ what happens with a cappela singers?Specific to on geographical area (Asia)⇒ important because ofvoice mix
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 23 / 27
Conclusion
Bibliography I
Fujihara, H., T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H.G. Okuno, 2005. Singer identification based on accompanimentsound reduction and reliable frame selection. In Proceedings ofthe International Conference on Music Information Retrieval.
Kim, Y. E. and B. Whitman, 2002. Singer identification in popularmusic recordings using voice coding features. In Proceedings ofthe International Conference on Music Information Retrieval.
Liu, C.-C. and C.-S. Huang, 2002. A singer identificationtechnique for content-based clas- sification of MP3 music objects.In Proceedings of the eleventh International Conference onInformation and Knowledge Management.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 24 / 27
Conclusion
Bibliography II
Tsai, W.-H. and H.-M. Wang, 2004. Automatic detection andtracking of target singer in multi-singer music recordings. InProceedings of the 2004 IEEE International Conferecence onAcoustics, Speech and Signal Processing, vol. 4. pp. 221–224.
Tsai, W.-H. and H.-M. Wang, 2006. Automatic singer recognitionof popular music recordings via estimation and modeling of solovocal signals. IEEE Transactions on Audio, Speech and LanguageProcessing, vol. 14: 330–341.
Zhang, T., 2003. Automatic singer identification. In Proceedings ofthe 2003 International Conference on Multimedia and Expo, vol.1., pp. 33–36.
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 25 / 27
Conclusion
Questions ?
Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 26 / 27