Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

1

Exploring the Relationship Between Multi-Modal Emotion

Semantics of MusicJu-Chiang Wang, Yi-Hsuan Yang, Kaichun Chang, Hsin-Min Wang, and Skyh-Kang Jeng

Academia Sinica, National Taiwan University,Taipei, Taiwan

2

Outline• Introduction and Potentiality• Methodology

– The ATB and AEG models– Framework to combine the two models

• Evaluation and Result• Conclusion• In this presentation, mood and emotion

are exchangeable

3

Introduction – Tag and Valence-Arousal (VA)

• Music emotion modeling, two approaches: • Share a unified goal of

understanding the emotion semantics of music

• (Arbitrary) mood tags can be mapped into the VA spacein an unsupervised and content-based manner, without any training ground truth for the semantic mapping

• Automatically generate a semantically structured tag cloudin the VA space

Categorical

DimensionalArousal

2 1

3 4

(high )

( low )

Valence(positive )(negative )

4

Visualization of Music Mood (Laurier et al.)

Generated by SOM

5

Potentiality (Clarifying the Debate)

• A novice user may be unfamiliar with VA model, it would be helpful to display mood tags in the VA space

• Facilitate applications such as tag-based music search and browsing interface

• Dimension reduction for tag visualization may result dimensions not conforming to valence and arousal

• The VA values of some affective terms can be found, but not elicited from music

• Affective terms are not cross lingual and not always have exact translations in different languages

• Cultural-dependent, corpus-dependent

6

Taxonomy of Music Mood (Xiao Hu, et al.)Aggressive 侵略的;好鬥

Amiable 和藹可親的;厚道的

Autumnal 秋的;像秋天的

Bittersweet 苦樂參半的

Boisterous 喧鬧的;狂暴的

Brooding 徘徊不去的;沈思的

Calm 冷靜;鎮定

Campy 裝模作樣;

Cheerful 興高采烈的;情緒好的

Confident 有信心的,自負的

Dreamy 夢幻般的;愛作白日夢的;

Fiery (感情)激烈的,熱烈的

Fun 有趣的

Humorous 幽默的;滑稽的

Intense 強烈的;熱情的

Literate 有文化修養的

Nostalgic 鄉愁的

Passionate 熱情的;熱烈的;易怒的

Poignant 深刻的;辛酸的

Quirky 詭詐的;多變的;古怪的

Relaxed 鬆懈的;放鬆的

Rollicking 嬉耍的;愉快的

Rousing 使覺醒的;使奮起的

Rowdy 粗暴的;喧鬧的

Silly 愚蠢的;糊塗的;無聊的

Soothing 慰藉的;使人寬心的

Sweet 甜的;悅耳的

Tense 緊張的;引起緊張的

Visceral 出自內心深處的

Volatile 易發作的;輕浮的;飛逝的

Whimsical 想入非非的,怪誕的,古怪的

Wistful 渴望的;想往的;留戀的

Witty 機智的;說話風趣的

Wry 歪斜的;曲解的;堅持錯誤的

GAP GAP

7

Potentiality (Clarifying the Debate)

Machine Learning is necessary for such a task

8

Methodology of the Framework• A probabilistic framework with two component models,

Acoustic Tag Bernoullis (ATB) and Acoustic Emotion Gaussians (AEG)– Computationally model the generative processes from acoustic

features to a mood tag and a VA value, respectively

• Based on the same acoustic feature space, the ATB and AEG models can share and transit the semantic information to each other

• Bridged by the acoustic feature space, we can align one emotion modality to the other

• The first attempt to establish a joint model for exploring between discrete mood categories and continuous emotion space

9

Construct Feature Reference Model

A1 A2AK-1

AK A3A4

Global GMM for acoustic feature encoding

EM Training

A Universal Music Database

Acoustic GMM

Music Tracks& Audio Signal

Frame-based Features

… …

… …

Global Set of frame vectors randomlyselected from each track

…


A Universal Music Database


10

Represent a Song into Probabilistic Space

1

2

K-1

K…

Posterior Probabilities over the Acoustic GMM

…

A1

A2

AK-1

Acoustic GMM

AK

…

Feature Vectors Histogram:Acoustic GMM Posterior

prob

Each dim corresponds to a specific acoustic pattern

1 2 K-1 K…

11

Acoustic Tag Bernoullis (ATB)• Given an mood-tagged music dataset with the binary

label for a mood tag• Learn ATB that describes the generative process of each

song in the dataset from acoustic features to mood tag• Won (AUC Clip) in Mood Tag Classification (MIREX2009,

2010)

12

Acoustic Emotion Gaussians (AEG)• Given a VA-annotated music dataset• Learn AEG that describes the generative process of

each song in the dataset from acoustic features to the VA space

• Presented in OS2, superior to its rivals, SVR and MLR

13

The Learning of VA GMM on MER60

14

Multi-Modal Emotion Semantic Mapping

• Three models are aligned, ATB, Acoustic GMM, and AEG• Transit the weights from a mood tag to the VA GMM• The semantic mapping processes are transparent and

easy to be observed and interpreted

Mapping a tag into a VA Gaussian distribution

15

Evaluation – Corpora and Settings

• Two corpora used: MER60 and AMG1644• MER60: jointly annotated corpus (MER60-alone setting)

– 60 music clips, each is 30-second– 99 subjects in total, each clip annotated by 40 subjects– The VA values are entered by clicking on the emotion space

on a computer display– Query Last.fm and leave 50 top mood tags for the 60 songs

• AMG1644: used for the separately annotated corpora(AMG1644-MER60 setting)– Crawl the audio of the “top songs” for 33 mood tags (AMG),

most of the tags are used in MIREX mood classification task– Leading to 1,644 clips, each is about 30-second

16

Acoustic Features

• Adopt the bag-of-frames representation

• Extracting frame-based musical features from audio using the MIRToolbox 1.3

• All the frames of a clip are aggregated into the acoustic GMM posterior and perform the analysis of emotion at the clip-level, instead of frame-level

• Frame-based features– Dynamic, spectral, timbre, and tonal– 70-dim concatenated feature vector for a frame

17

Result for the MER60-Alone Setting

• Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping

18

• Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping

Result for the AMG-MER Setting

19

Comparison with Psychologist

• Quantitative comparison– Refer to the VA values of 30 affective terms proposed by

Whissell and Plutchik (WP) and by the Affective Norms for English Words (ANEW)

– For a tag, measure the Euclidean distance between the generated VA value and the psychologists’ one

• Baseline– Set the generated VA values of each tag to the origin– Represent a non-effective tag-VA mapping

20

Discussion

• The result is not sensitive to K

• Such a learning-based framework is scalable and can do better if more annotated data is available

• Automatic discovering– For instance, construct a balance audio music corpus and let

Chinese to label the Chinese mood tags– Generate a Chinese mood tag cloud

• Inverse correlation between the VA intensity and the covariance of a tag– Tags lying on the outer circle would have larger font sizes

21

Result for the MER60-Alone Setting

22

Conclusion

• A novel framework that unifies the categorical and dimensional emotion semantics of music

• Demonstrated how to map a mood tag to a 2-D VA Gaussian and generate the corresponding tag cloud, and this can be further extended to arbitrary tags

• Verify whether an arbitrary tag is mood-related or not

• We will conduct user studies for the result

• More investigations in acoustic feature representations for better generalization of the emotion modeling

23

Arbitrary Tag - MajorMiner Not Mood-related

24

Arbitrary Tag - MajorMiner Mood-related

Date post:	05-Dec-2014
Category:	Technology
Upload:	ju-chiang-wang
View:	826 times
Download:	3 times

Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

Technology