Research Seminar at National Institute of Informatics, Japan
51
Multi-modal Music Mood Classification Using Audio, Lyrics and Social Tags Xiao Hu National Institute of Informatics July 5, 2011
Transcript
1. Multi-modal Music MoodClassification Using Audio, Lyrics and
Social Tags Xiao Hu National Institute of Informatics July 5,
2011
2. Outline Multimodal Music Mood Classification Research
questions Methodology Findings and contributions Future Research
2
3. Music Mood ClassificationExercise: What do you feel about
Here comes the sun,How people categorize music mood? here comes the
sun, and I say its all right Little darling, its been a How well
can computer do it? long cold lonely winter Little darling, it
feels like years since its been here Here comes the sun, here comes
the sun, . 3
4. Why Mood 4
5. State-of-the-Art Mood categories directly adopted from music
psychological models Lack for social context of music listening
(Juslin & Laukka, 2004) Can social tags help? Evaluation
datasets are small Low consistency cross assessors (Skowronek et
al., 2006 Hu et al., 2008) Suboptimal performances of automatic
music mood classification systems Mostly audio-based Can lyrics
help? 5
6. Research Questions Q1: Can social tags help develop mood
taxonomy? Q2: Which lyric features are the most useful for music
mood classification? Q3: Are lyrics better than audio in music mood
classification? Q4: Can combining lyrics and audio improve the
effectiveness of mood classification? Q5: Can combining lyrics and
audio improve the efficiency of mood classification? Number of
training examples Length of audio dataQ2-5: Improving
classification performance by combining lyrics and audio 6
7. Q1: Mood Categories New topic in information science
Influential models in music psychology Categorical : Hevner (1936)
Dimensional : Russell (1980) often used in previous research on
music mood classification 7
8. Russells 2D Model
9. Can Social Tags Help? Last.fm One largest tagging site for
Western popular music
10. Social Tags Pros: Users perspectives Large quantity Cons:
Noisy: I aaaaam lovin it Linguistic Resources: WordNet-Affect
Ambiguous: love Human Expertise: Synonyms: calm, serene 2 music
retrieval experts native English speakers Long tail 10
11. 11
12. Distances between Categories Calculated by song
co-occurrences Categories associating with the same songs are
similar Plotted in 2-D space using Multidimensional Scaling 12
13. Identified Categories 13
14. Research Questions Q1: Can social tags help identify mood
categories that are more realistic? Q2: Which lyric features are
the most useful for music mood classification? Q3: Are lyrics
better than audio in music mood classification? Q4 Can combining
lyrics and audio improve the effectiveness of mood classification?
Q5: Can combining lyrics and audio improve the efficiency of mood
classification? Number of training examples Length of audio data
14
15. What do they feel about
16. Multi-modal Social Tags Mood Categories Ground Truth MUSIC
Audio Lyrics Automatic ClassificationQ2-5: Improving classification
performance by combining lyrics and audio 16
18. Ground Truth Dataset Built from social tags Has audio,
lyrics and social tags 5,296 unique songs 18 mood categories Equal
positive and negative examples 12,980 examples numbers of positive
examples in categories 18
19. Baseline System (audio-based) The AMC tasks in MIREX MIREX:
Music Information Retrieval Evaluation eXchange AMC: Audio Mood
Classification A leading system in AMC 2007 and 2008: Marsyas Music
Analysis, Retrieval and Synthesis for Audio Signals; led by Prof.
[email protected] Uses audio spectral features 19
20. Lyric-based System Very little existing work Only used
basic text features: bag_of_words, part_of_speech Worse than
audio-based approaches This research extracted and compared a range
of novel lyric features 20
21. Best Lyric Features Basic features: Content words,
part-of-speech, function words Psycholinguistic features:
Psychological categories in GI (General Inquirer) Scores in ANEW
(Affective Norm of English Words) Stylistic features: Punctuation
marks; interjection words Statistics: e.g., how many words per
minute Combinations: 255 of them! Most comprehensive study on lyric
classification so far. 21
22. Lyric Feature Example 22
23. No significant difference between top combinations23
24. 24
25. 25
26. 26
27. Research Questions Q1: Can social tags help identify mood
categories that are more realistic? Q2: Which lyric features are
the most useful for music mood classification? Q3: Are lyrics
better than audio in music mood classification? Q4 Can combining
lyrics and audio improve the effectiveness of mood classification?
Q5: Can combining lyrics and audio improve the efficiency of mood
classification? Number of training examples Length of audio data
27
28. Combine Lyrics and Audio Two hybrid methods: Late fusion
Lyric Classifier Prediction Final Prediction Prediction Audio
Classifier Feature concatenation Classifier Prediction 28
29. System PerformancesAudio + Lyrics Lyrics Audio 30
30. Effectiveness 31
31. Research Questions Q1: Can social tags help identify mood
categories that are more realistic? Q2: Which lyric features are
the most useful for music mood classification? Q3: Are lyrics
better than audio in music mood classification? Q4 Can combining
lyrics and audio improve the effectiveness of mood classification?
Q5: Can combining lyrics and audio improve the efficiency of mood
classification? Number of training examples Length of audio data
33
32. Automatic Classification (supervised learning) Classifier
for Happy Here comes the sun Y Y I will be back N Down with the N
sickness N Song A Y Song B N N Training examplesfor Happy New
examples 34
33. Learning Curves
34. ConclusionsQ1: Can social tags help identify mood
categories that are more realistic?Q2: The most useful lyric
Combination of words, linguistic features are: features and text
stylistic featuresQ3: Are lyrics better than audio in music mood
classification ?Q4: Can combining lyrics and audio improve the
effectiveness of mood classification?Q5: Can combining lyrics and
audio improve the efficiency of mood classification? 36
35. What does computer feel about
36. ContributionsMethodology Mood categories identified from
social tags complement psychological models Established an example
of using empirical data to refine/adapt theoretical models Improved
lyric affect analysis and multi-modal mood classificationEvaluation
Proposed efficient method in building ground truth datasets Largest
dataset with ternary information sources to date made available to
MIR community via MIREX 2009
http://www.music-ir.org/mirex/2009/index.php/Audio_Tag_ClassificationApplication
Provided practical reference for MIR systems Moodydb.com 38
37. Application 39
38. Feature Analysis 40
39. Audio vs. Lyrics 41
40. Top Lyric Features 42
41. Top Lyric Features in Calm 43
42. Top Affective Words vs. 44
43. Future Research Directions 45
44. Affect Analysis for Information Studies Affect is an
important factor in information behavior and information access NLP
techniques have been applied to attitude, sentiment and opinion
analysis I am interested in its applications on human cognition and
learning English and Chinese; Text and Music Paper accepted to
ISMIR Exploring the Relationship Between Mood and Creativityin Rock
Lyrics 46
45. Future Research Directions Multimedia, multimodal:
audio-visual-textual 47
46. Summary Multimodal Music Mood Classification Combining
lyrics and audio helps improve effectiveness efficiency
Contributions Feature analysis Future Research Affect factor in
informatics Multimodal, multimedia (Photo mining seminar on
Thursday! Prof. Winston Hsu from Taiwan) 48
47. 49
48. References Hu, X. and Downie, J. S. (2010) When Lyrics
Outperform Audio for Music Mood Classification: A Feature Analysis,
In Proceedings of the 10th International Conference on Music
Information Retrieval (ISMIR), Aug. 2010, Utrecht, Netherland. Hu,
X. and Downie, J. S. (2010) Improving Mood Classification in Music
Digital Libraries by Combining Lyrics and Audio, In Proceedings of
the Joint Conference on Digital Libraries2010, (JCDL), June 2010,
Surfers Paradise, Australia. (Best Student Paper Award). Hu, X.
(2010) Music and Mood: Where Theory and Reality Meet, In the
Proceedings of the 5th iConference, University of Illinois at
Urbana-Champaign, Feb. 2010, Champaign, IL (Best Student Paper
Award). Hu, X. Downie, J. S. and Ehmann, A.(2009) Lyric Text Mining
in Music Mood Classification, ISMIR 09. Hu, X. (2009) Combining
Text and Audio for Music Mood Classification in Music Digital
Libraries, IEEE Bulletin of Technical Committee on Digital
Libraries (TCDL), 5(3) Hu, X. (2010) Multi-modal Music Mood
Classification, presented in the Jean Tague- Sutcliffe Doctoral
Research Poster session at the ALISE Annual Conference, Jan. 2010,
Boston, MA. (3rd Place Award). Hu, X. (2009) Categorizing Music
Mood in Social Context, In Proceedings of the Annual Meeting of
ASIS&T (CD-ROM), Nov. 2009, Vancouver, Canada. 50
49. References (2) Hu, X., Downie, J. S., Laurier, C., Bay, M.
and Ehmann, A. (2008a). The 2007 MIREX Audio Music Classification
task: lessons learned, In Proceedings of the 9th International
Conference on Music Information Retrieval (ISMIR08). Sept. 2008,
Philadelphia, USA. Juslin, P. N. and Laukka, P. (2004). Expression,
perception, and induction of musical emotions: a review and a
questionnaire study of everyday listening. Journal of New Music
Research, 33(3): 217-238. Juslin, P. N. and Sloboda, J. A. (2001).
Music and emotion: introduction. In P. N. Juslin and J. A. Sloboda
(Eds.), Music and Emotion: Theory and Research. New York: Oxford
University Press. Skowronek, J., McKinney, M. F. and van de Par, S.
(2006). Ground truth for automatic music mood classification. In
Proceedings of the 7th International Conference on Music
Information Retrieval (ISMIR06), Oct. 2006, Victoria, Canada.
51