Emotions in Engineering: Methods for the Interpretation of Ambiguous Emotional Content
Emily Mower April 29, 2011
Motivation
April 29, 2011
• Increasing prevalence of interactive technology • Importance of emotion
understanding
• Engineering research starting to overlap with human behavioral research: • Autism • Depression • Marital therapy • General interaction dynamics • Psychiatric disorders
2
Motivating Example
April 29, 2011 3
Emotional Computer Assistant • Provides interaction assistance • Describes the emotions of others • Allows user to understand stimuli for proper response
User:
Other :
???
User:
Other :
Assistant:
“The other person is frustrated… this is a mix of
anger and sadness”
“I am sorry”
Focus of this Presentation
April 29, 2011 4
Emotion Profiles: A novel mid-level representation for quantifying emotion
• Overview:
• Alleviates limitations of current frameworks • Captures shades of emotion • Represents ambiguous utterances
• Component of classification • Stand-alone representation • Interpretable and informative • Can be used in a user-personalization
framework
Key finding: EPs can be used to track the emotional trajectory of audio-visual utterances
Frustration
Ang
ry
Hap
py
Neu
tral
Sa
d
Emotion Profile Representation
Data Overview: USC IEMOCAP
April 29, 2011
• Data: • 5 m-f pairs of actors • Audio, video,
motion-capture (x,y,z)
• Elicitation Strategy: • Scripted sessions • Improvisation scenarios
• Emotional descriptors: • Categorical • Dimensional
*Data collection led by Carlos Busso, UT Dallas
5
Feature Extraction and Selection
April 29, 2011 6
• Extraction: • Utterance-length • Mean, variance, range,
upper-quantile, lower-quantile, quantile range
• Final feature set: • Principal Feature
Analysis • Top 30 features
Audio Features: Prosodic: pitch and energy
Spectral: Mel Filterbank Coefficients
Video Features: Motion capture relative
distances Mouth, Eyebrows, Cheek,
Forehead
Emotion Profiles
April 13, 2011 7
Describe the presence or absence of multiple emotion classes in a single clip using an estimate of classifier confidence
• Binary Support Vector Machine classifications • Self vs. other • Matlab implementation
• Output: • Binary yes/no for class membership • Distance from hyperplane
• Interpretation: • Weight the binary output by the
distance from the hyperplane (“confidence”)
Classification:
Angry vs. Not Angry
Happy vs.
Not Happy
Sad vs. Not Sad
Neutral vs. Not
Neutral
Emotion Profile Construction
April 29, 2011
Val.
Act.
Form semantic clusters using disjoint set of speakers
Train Self vs. Other Binary Classifiers on Each Semantic Cluster
4 Binary Classifiers
4-Dimensional Profiles For
Test Speaker
Utterances From Test Speaker
Use trained binary classifiers to create an estimate of the emotion content
8
• Target value • Lagrange
multiplier • Weight vector • Offset
Distance-Based Profile Measures
April 29, 2011
- Angry - Happy
9
Emotograms: Dynamic Emotion Profiles
April 29, 2011
Emotogram for an Utterance Labeled “Happy”
Emotion Profile for an Utterance Labeled “Happy”
A
H
N
S
10
Problem Setup
April 29, 2011
Goal: Classify the affective state of clips at the utterance level using Emotograms
• Features extracted over 10 (5m/5f) IEMOCAP speakers: • Motion capture: relative distances • Audio: prosodic, spectral • Feature Selection: Principal Feature Analysis (30 features)
• Extract EPs over window lengths: 0.25 – 2 seconds • Train binary angry, happy, neutral, sad SVMs on disjoint set of speakers (9)
• Model the trajectory of the EPs • Train angry, happy, neutral, sad HMMs on disjoint set of speakers (9)
• Validation: • Leave-one-subject-out cross-validation (over each test speaker, merged results)
11
Emotogram Construction
April 29, 2011 12
Results
April 29, 2011 13
Conclusions and Future Directions
April 29, 2011
• Hierarchical system improves classification performance over all sentence lengths when compared to static only (absolute / relative): • 6+ -- 7.84% / 11.75% • 3-6 – 3.55% / 5.48% • 1.5-3 – 0.54% / 0.87%
• Largest improvement with longest sentences: • Implies that there exists a recognized pattern of emotion fluctuation
• Human ability:
• We can tell when emotions sound “wrong” • Flat affect is a diagnostic tool
• Implication:
• Emotion modulations can be modeled by people • This modulation may be modeled using a grammar
14
Published Work in Emotion Profiles
April 29, 2011
1. Emily Mower and Shrikanth Narayanan. “A Hierarchical Static-Dynamic Framework for Emotion Classification.” International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Prague, Czech Republic. May 2011.
2. Emily Mower, Maja J Matarić, Shrikanth Narayanan. “Framework for Automatic Human Emotion Classification Using Emotional Profiles.” IEEE Transactions on Audio, Speech, and Language Processing, 2010.
3. Emily Mower, Maja J Matarić, Shrikanth Narayanan. “Robust Representations for Out-of-Domain Emotions Using Emotion Profiles.” Spoken Language Technology (SLT). Berkeley, CA, December 2010.
4. Emily Mower, Kyu J. Han, Sungbok Lee and Shrikanth S. Narayanan. "A Cluster-Profile Representation of Emotion Using Agglomerative Hierarchical Clustering." InterSpeech. Makuhari, Japan, September 2010.
5. Emily Mower, Angeliki Metallinou, Chi-Chun Lee, Abe Kazemzadeh, Carlos Busso, Sungbok Lee, Shrikanth Narayanan. "Interpreting Ambiguous Emotional Expressions." ACII Special Session: Recognition of Non-Prototypical Emotion from Speech- The Final Frontier? (Invited paper). Amsterdam, The Netherlands, September 2009.
15
Thanks!
April 29, 2011 16
Questions?
Emotions in Engineering: �Methods for the Interpretation of Ambiguous Emotional ContentMotivationMotivating ExampleFocus of this PresentationData Overview: USC IEMOCAPFeature Extraction and SelectionEmotion ProfilesEmotion Profile ConstructionDistance-Based Profile MeasuresEmotograms: Dynamic Emotion ProfilesProblem SetupEmotogram ConstructionResultsConclusions and Future DirectionsPublished Work in Emotion ProfilesThanks!