Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | ju-chiang-wang |
View: | 826 times |
Download: | 3 times |
1
Exploring the Relationship Between Multi-Modal Emotion
Semantics of MusicJu-Chiang Wang, Yi-Hsuan Yang, Kaichun Chang, Hsin-Min Wang, and Skyh-Kang Jeng
Academia Sinica, National Taiwan University,Taipei, Taiwan
2
Outline• Introduction and Potentiality• Methodology
– The ATB and AEG models– Framework to combine the two models
• Evaluation and Result• Conclusion• In this presentation, mood and emotion
are exchangeable
3
Introduction – Tag and Valence-Arousal (VA)
• Music emotion modeling, two approaches: • Share a unified goal of
understanding the emotion semantics of music
• (Arbitrary) mood tags can be mapped into the VA spacein an unsupervised and content-based manner, without any training ground truth for the semantic mapping
• Automatically generate a semantically structured tag cloudin the VA space
Categorical
DimensionalArousal
2 1
3 4
(high )
( low )
Valence(positive )(negative )
4
Visualization of Music Mood (Laurier et al.)
Generated by SOM
5
Potentiality (Clarifying the Debate)
• A novice user may be unfamiliar with VA model, it would be helpful to display mood tags in the VA space
• Facilitate applications such as tag-based music search and browsing interface
• Dimension reduction for tag visualization may result dimensions not conforming to valence and arousal
• The VA values of some affective terms can be found, but not elicited from music
• Affective terms are not cross lingual and not always have exact translations in different languages
• Cultural-dependent, corpus-dependent
6
Taxonomy of Music Mood (Xiao Hu, et al.)Aggressive 侵略的;好鬥
Amiable 和藹可親的;厚道的
Autumnal 秋的;像秋天的
Bittersweet 苦樂參半的
Boisterous 喧鬧的;狂暴的
Brooding 徘徊不去的;沈思的
Calm 冷靜;鎮定
Campy 裝模作樣;
Cheerful 興高采烈的;情緒好的
Confident 有信心的,自負的
Dreamy 夢幻般的;愛作白日夢的;
Fiery (感情)激烈的,熱烈的
Fun 有趣的
Humorous 幽默的;滑稽的
Intense 強烈的;熱情的
Literate 有文化修養的
Nostalgic 鄉愁的
Passionate 熱情的;熱烈的;易怒的
Poignant 深刻的;辛酸的
Quirky 詭詐的;多變的;古怪的
Relaxed 鬆懈的;放鬆的
Rollicking 嬉耍的;愉快的
Rousing 使覺醒的;使奮起的
Rowdy 粗暴的;喧鬧的
Silly 愚蠢的;糊塗的;無聊的
Soothing 慰藉的;使人寬心的
Sweet 甜的;悅耳的
Tense 緊張的;引起緊張的
Visceral 出自內心深處的
Volatile 易發作的;輕浮的;飛逝的
Whimsical 想入非非的,怪誕的,古怪的
Wistful 渴望的;想往的;留戀的
Witty 機智的;說話風趣的
Wry 歪斜的;曲解的;堅持錯誤的
GAP GAP
7
Potentiality (Clarifying the Debate)
Machine Learning is necessary for such a task
8
Methodology of the Framework• A probabilistic framework with two component models,
Acoustic Tag Bernoullis (ATB) and Acoustic Emotion Gaussians (AEG)– Computationally model the generative processes from acoustic
features to a mood tag and a VA value, respectively
• Based on the same acoustic feature space, the ATB and AEG models can share and transit the semantic information to each other
• Bridged by the acoustic feature space, we can align one emotion modality to the other
• The first attempt to establish a joint model for exploring between discrete mood categories and continuous emotion space
9
Construct Feature Reference Model
A1 A2AK-1
AK A3A4
Global GMM for acoustic feature encoding
EM Training
A Universal Music Database
Acoustic GMM
Music Tracks& Audio Signal
Frame-based Features
… …
… …
Global Set of frame vectors randomlyselected from each track
…
Music Tracks& Audio Signal
A Universal Music Database
Music Tracks& Audio Signal
10
Represent a Song into Probabilistic Space
1
2
K-1
K…
Posterior Probabilities over the Acoustic GMM
…
A1
A2
AK-1
Acoustic GMM
AK
…
Feature Vectors Histogram:Acoustic GMM Posterior
prob
Each dim corresponds to a specific acoustic pattern
1 2 K-1 K…
11
Acoustic Tag Bernoullis (ATB)• Given an mood-tagged music dataset with the binary
label for a mood tag• Learn ATB that describes the generative process of each
song in the dataset from acoustic features to mood tag• Won (AUC Clip) in Mood Tag Classification (MIREX2009,
2010)
12
Acoustic Emotion Gaussians (AEG)• Given a VA-annotated music dataset• Learn AEG that describes the generative process of
each song in the dataset from acoustic features to the VA space
• Presented in OS2, superior to its rivals, SVR and MLR
13
The Learning of VA GMM on MER60
14
Multi-Modal Emotion Semantic Mapping
• Three models are aligned, ATB, Acoustic GMM, and AEG• Transit the weights from a mood tag to the VA GMM• The semantic mapping processes are transparent and
easy to be observed and interpreted
Mapping a tag into a VA Gaussian distribution
15
Evaluation – Corpora and Settings
• Two corpora used: MER60 and AMG1644• MER60: jointly annotated corpus (MER60-alone setting)
– 60 music clips, each is 30-second– 99 subjects in total, each clip annotated by 40 subjects– The VA values are entered by clicking on the emotion space
on a computer display– Query Last.fm and leave 50 top mood tags for the 60 songs
• AMG1644: used for the separately annotated corpora(AMG1644-MER60 setting)– Crawl the audio of the “top songs” for 33 mood tags (AMG),
most of the tags are used in MIREX mood classification task– Leading to 1,644 clips, each is about 30-second
16
Acoustic Features
• Adopt the bag-of-frames representation
• Extracting frame-based musical features from audio using the MIRToolbox 1.3
• All the frames of a clip are aggregated into the acoustic GMM posterior and perform the analysis of emotion at the clip-level, instead of frame-level
• Frame-based features– Dynamic, spectral, timbre, and tonal– 70-dim concatenated feature vector for a frame
17
Result for the MER60-Alone Setting
• Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping
18
• Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping
Result for the AMG-MER Setting
19
Comparison with Psychologist
• Quantitative comparison– Refer to the VA values of 30 affective terms proposed by
Whissell and Plutchik (WP) and by the Affective Norms for English Words (ANEW)
– For a tag, measure the Euclidean distance between the generated VA value and the psychologists’ one
• Baseline– Set the generated VA values of each tag to the origin– Represent a non-effective tag-VA mapping
20
Discussion
• The result is not sensitive to K
• Such a learning-based framework is scalable and can do better if more annotated data is available
• Automatic discovering– For instance, construct a balance audio music corpus and let
Chinese to label the Chinese mood tags– Generate a Chinese mood tag cloud
• Inverse correlation between the VA intensity and the covariance of a tag– Tags lying on the outer circle would have larger font sizes
21
Result for the MER60-Alone Setting
22
Conclusion
• A novel framework that unifies the categorical and dimensional emotion semantics of music
• Demonstrated how to map a mood tag to a 2-D VA Gaussian and generate the corresponding tag cloud, and this can be further extended to arbitrary tags
• Verify whether an arbitrary tag is mood-related or not
• We will conduct user studies for the result
• More investigations in acoustic feature representations for better generalization of the emotion modeling
23
Arbitrary Tag - MajorMiner Not Mood-related
24
Arbitrary Tag - MajorMiner Mood-related