+ All Categories
Home > Technology > Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

Date post: 05-Dec-2014
Category:
Upload: ju-chiang-wang
View: 826 times
Download: 3 times
Share this document with a friend
Description:
Computational modeling of music emotion has been addressed primarily by two approaches: the categorical approach that categorizes emotions into mood classes and the dimensional approach that regards emotions as numerical values over a few dimensions such as valence and activation. Being two extreme scenarios (discrete/continuous), the two approaches actually share a unified goal of understanding the emotion semantics of music. This paper presents the first computational model that unifies the two semantic modalities under a probabilistic framework, which makes it possible to explore the relationship between them in a computational way. With the proposed framework, mood labels can be mapped into the emotion space in an unsupervised and content-based manner, without any training ground truth annotations for the semantic mapping. Such a function can be applied to automatically generate a semantically structured tag cloud in the emotion space. To demonstrate the effectiveness of the proposed framework, we qualitatively evaluate the mood tag clouds generated from two emotion-annotated corpora, and quantitatively evaluate the accuracy of the categorical-dimensional mapping by comparing the results with those created by psychologists, including the one proposed by Whissell & Plutchik and the one defined in the Affective Norms for English Words (ANEW).
24
1 Exploring the Relationship Between Multi-Modal Emotion Semantics of Music Ju-Chiang Wang, Yi-Hsuan Yang, Kaichun Chang, Hsin-Min Wang, and Skyh-Kang Jeng Academia Sinica, National Taiwan University, Taipei, Taiwan
Transcript
Page 1: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

1

Exploring the Relationship Between Multi-Modal Emotion

Semantics of MusicJu-Chiang Wang, Yi-Hsuan Yang, Kaichun Chang, Hsin-Min Wang, and Skyh-Kang Jeng

Academia Sinica, National Taiwan University,Taipei, Taiwan

Page 2: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

2

Outline• Introduction and Potentiality• Methodology

– The ATB and AEG models– Framework to combine the two models

• Evaluation and Result• Conclusion• In this presentation, mood and emotion

are exchangeable

Page 3: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

3

Introduction – Tag and Valence-Arousal (VA)

• Music emotion modeling, two approaches: • Share a unified goal of

understanding the emotion semantics of music

• (Arbitrary) mood tags can be mapped into the VA spacein an unsupervised and content-based manner, without any training ground truth for the semantic mapping

• Automatically generate a semantically structured tag cloudin the VA space

Categorical

DimensionalArousal

2 1

3 4

(high )

( low )

Valence(positive )(negative )

Page 4: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

4

Visualization of Music Mood (Laurier et al.)

Generated by SOM

Page 5: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

5

Potentiality (Clarifying the Debate)

• A novice user may be unfamiliar with VA model, it would be helpful to display mood tags in the VA space

• Facilitate applications such as tag-based music search and browsing interface

• Dimension reduction for tag visualization may result dimensions not conforming to valence and arousal

• The VA values of some affective terms can be found, but not elicited from music

• Affective terms are not cross lingual and not always have exact translations in different languages

• Cultural-dependent, corpus-dependent

Page 6: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

6

Taxonomy of Music Mood (Xiao Hu, et al.)Aggressive 侵略的;好鬥

Amiable 和藹可親的;厚道的

Autumnal 秋的;像秋天的

Bittersweet 苦樂參半的

Boisterous 喧鬧的;狂暴的

Brooding 徘徊不去的;沈思的

Calm 冷靜;鎮定

Campy 裝模作樣;

Cheerful 興高采烈的;情緒好的

Confident 有信心的,自負的

Dreamy 夢幻般的;愛作白日夢的;

Fiery (感情)激烈的,熱烈的

Fun 有趣的

Humorous 幽默的;滑稽的

Intense 強烈的;熱情的

Literate 有文化修養的

Nostalgic 鄉愁的

Passionate 熱情的;熱烈的;易怒的

Poignant 深刻的;辛酸的

Quirky 詭詐的;多變的;古怪的

Relaxed 鬆懈的;放鬆的

Rollicking 嬉耍的;愉快的

Rousing 使覺醒的;使奮起的

Rowdy 粗暴的;喧鬧的

Silly 愚蠢的;糊塗的;無聊的

Soothing 慰藉的;使人寬心的

Sweet 甜的;悅耳的

Tense 緊張的;引起緊張的

Visceral 出自內心深處的

Volatile 易發作的;輕浮的;飛逝的

Whimsical 想入非非的,怪誕的,古怪的

Wistful 渴望的;想往的;留戀的

Witty 機智的;說話風趣的

Wry 歪斜的;曲解的;堅持錯誤的

GAP GAP

Page 7: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

7

Potentiality (Clarifying the Debate)

Machine Learning is necessary for such a task

Page 8: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

8

Methodology of the Framework• A probabilistic framework with two component models,

Acoustic Tag Bernoullis (ATB) and Acoustic Emotion Gaussians (AEG)– Computationally model the generative processes from acoustic

features to a mood tag and a VA value, respectively

• Based on the same acoustic feature space, the ATB and AEG models can share and transit the semantic information to each other

• Bridged by the acoustic feature space, we can align one emotion modality to the other

• The first attempt to establish a joint model for exploring between discrete mood categories and continuous emotion space

Page 9: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

9

Construct Feature Reference Model

A1 A2AK-1

AK A3A4

Global GMM for acoustic feature encoding

EM Training

A Universal Music Database

Acoustic GMM

Music Tracks& Audio Signal

Frame-based Features

… …

… …

Global Set of frame vectors randomlyselected from each track

Music Tracks& Audio Signal

A Universal Music Database

Music Tracks& Audio Signal

Page 10: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

10

Represent a Song into Probabilistic Space

1

2

K-1

K…

Posterior Probabilities over the Acoustic GMM

A1

A2

AK-1

Acoustic GMM

AK

Feature Vectors Histogram:Acoustic GMM Posterior

prob

Each dim corresponds to a specific acoustic pattern

1 2 K-1 K…

Page 11: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

11

Acoustic Tag Bernoullis (ATB)• Given an mood-tagged music dataset with the binary

label for a mood tag• Learn ATB that describes the generative process of each

song in the dataset from acoustic features to mood tag• Won (AUC Clip) in Mood Tag Classification (MIREX2009,

2010)

Page 12: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

12

Acoustic Emotion Gaussians (AEG)• Given a VA-annotated music dataset• Learn AEG that describes the generative process of

each song in the dataset from acoustic features to the VA space

• Presented in OS2, superior to its rivals, SVR and MLR

Page 13: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

13

The Learning of VA GMM on MER60

Page 14: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

14

Multi-Modal Emotion Semantic Mapping

• Three models are aligned, ATB, Acoustic GMM, and AEG• Transit the weights from a mood tag to the VA GMM• The semantic mapping processes are transparent and

easy to be observed and interpreted

Mapping a tag into a VA Gaussian distribution

Page 15: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

15

Evaluation – Corpora and Settings

• Two corpora used: MER60 and AMG1644• MER60: jointly annotated corpus (MER60-alone setting)

– 60 music clips, each is 30-second– 99 subjects in total, each clip annotated by 40 subjects– The VA values are entered by clicking on the emotion space

on a computer display– Query Last.fm and leave 50 top mood tags for the 60 songs

• AMG1644: used for the separately annotated corpora(AMG1644-MER60 setting)– Crawl the audio of the “top songs” for 33 mood tags (AMG),

most of the tags are used in MIREX mood classification task– Leading to 1,644 clips, each is about 30-second

Page 16: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

16

Acoustic Features

• Adopt the bag-of-frames representation

• Extracting frame-based musical features from audio using the MIRToolbox 1.3

• All the frames of a clip are aggregated into the acoustic GMM posterior and perform the analysis of emotion at the clip-level, instead of frame-level

• Frame-based features– Dynamic, spectral, timbre, and tonal– 70-dim concatenated feature vector for a frame

Page 17: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

17

Result for the MER60-Alone Setting

• Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping

Page 18: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

18

• Graphviz for visualization, Voronoi diagram-based heuristic to avoid tag overlapping

Result for the AMG-MER Setting

Page 19: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

19

Comparison with Psychologist

• Quantitative comparison– Refer to the VA values of 30 affective terms proposed by

Whissell and Plutchik (WP) and by the Affective Norms for English Words (ANEW)

– For a tag, measure the Euclidean distance between the generated VA value and the psychologists’ one

• Baseline– Set the generated VA values of each tag to the origin– Represent a non-effective tag-VA mapping

Page 20: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

20

Discussion

• The result is not sensitive to K

• Such a learning-based framework is scalable and can do better if more annotated data is available

• Automatic discovering– For instance, construct a balance audio music corpus and let

Chinese to label the Chinese mood tags– Generate a Chinese mood tag cloud

• Inverse correlation between the VA intensity and the covariance of a tag– Tags lying on the outer circle would have larger font sizes

Page 21: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

21

Result for the MER60-Alone Setting

Page 22: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

22

Conclusion

• A novel framework that unifies the categorical and dimensional emotion semantics of music

• Demonstrated how to map a mood tag to a 2-D VA Gaussian and generate the corresponding tag cloud, and this can be further extended to arbitrary tags

• Verify whether an arbitrary tag is mood-related or not

• We will conduct user studies for the result

• More investigations in acoustic feature representations for better generalization of the emotion modeling

Page 23: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

23

Arbitrary Tag - MajorMiner Not Mood-related

Page 24: Exploring the Relationship Between Multi-Modal Emotion Semantics of Music

24

Arbitrary Tag - MajorMiner Mood-related


Recommended