7/31/2019 Production of Emotional Speech
1/41
Gabriel Schubiner
7/31/2019 Production of Emotional Speech
2/41
Generation of Affect inSynthesized Speech
Corpus-based approach to
synthesisExpressive visual speech usingtalking head
Demos
Affect Editor Quiz/Demo
Synface Demo
7/31/2019 Production of Emotional Speech
3/41
Affect in SpeechGoals
Addition of Emotion to Synthetic speech
Acoustic ModelTypology of parameters of emotionalspeech
Quantification
Addresses problem of expressiveness
What benefit is gained from expressive
speech?
7/31/2019 Production of Emotional Speech
4/41
mo on eoryAssumptionsAssumptionsEmotion -> Nervous System ->
Speech Output
Binary distinction
Parasympathetic vsSympathetic
based on physical changes
universal emotions
7/31/2019 Production of Emotional Speech
5/41
Approaches to
AffectGenerative
Emotion -> Physical ->Acoustic
Descriptive
Observed acoustic paramsimposed
7/31/2019 Production of Emotional Speech
6/41
Descriptive
Framework4 Parameter groupsPitch
Timing
Voice Quality
Articulation
Assumption of independence
How could this affect design and
results?
7/31/2019 Production of Emotional Speech
7/41
Pitch
TimingAccent ShapeAverage Pitch
Contour Slope
Final Lowering
Pitch Range
Reference Line
Exaggeration (not used)
7/31/2019 Production of Emotional Speech
8/41
Voice Quality
ArticulationBreathiness
Brilliance
Loudness
Pause Discontinuity
Pitch Discontinuity
Tremor
Laryngealization
7/31/2019 Production of Emotional Speech
9/41
Implementation
Each parameter has scale
Each scale is independent
from other parameters
between positive andnegative
7/31/2019 Production of Emotional Speech
10/41
Implementation
Settings grouped into presetconditions for each emotion
based on prior studies
7/31/2019 Production of Emotional Speech
11/41
Program Flow:
InputEmotion -> parameterrepresentationUtterance -> clauses
Agent, Action, Object, Locative
Clause and lexeme annotations
Finds all possible locations ofaffect and chooses whether ornot to use
7/31/2019 Production of Emotional Speech
12/41
Program Flow
Utterance -> Tree structure ->
linear phonologycompiled for specificsynthesizer with software to
simulate affects not available inhardware
7/31/2019 Production of Emotional Speech
13/41
7/31/2019 Production of Emotional Speech
14/41
Perception
30 Utterances
5 sentences * 6 affects
Forced choice of one of sixaffects
magnitude and comments
7/31/2019 Production of Emotional Speech
15/41
Elicitation
SentencesIntroIm almost finished
Im going to the city
I saw your name in the paper X
I thought you really meant it
Look at that picture
7/31/2019 Production of Emotional Speech
16/41
PopQuiz!!!
7/31/2019 Production of Emotional Speech
17/41
Pop Quiz SolutionsIm almost finished
Disgust : Surprise : Sadness : Gladness :Anger : Fear
Im going to the citySurprise : Gladness : Anger : Disgust :Sadness : Fear
I thought you really meant it
Anger : Disgust : Gladness : Sadness : Fear :Surprise
Look at that picture
Anger : Fear : Disgust : Sadness : Gladness :Sur rise
7/31/2019 Production of Emotional Speech
18/41
Results
approx 50% recognition rate
91% sadness
7/31/2019 Production of Emotional Speech
19/41
7/31/2019 Production of Emotional Speech
20/41
Conclusions
Effective?
Thoughts?
7/31/2019 Production of Emotional Speech
21/41
Corpus-basedApproach to
Expressive SpeechSynthesis
7/31/2019 Production of Emotional Speech
22/41
Corpus
Collect utterances in each
emotionemotion-dependent semantics
One speaker
Good news, Bad news, Question
7/31/2019 Production of Emotional Speech
23/41
Model: Feature
VectorFeaturesLexical stress
Phrase-level stressDistance from beginning of phrase
Distance from end of phrase
POSPhrase-type
End of syllable pitch
7/31/2019 Production of Emotional Speech
24/41
Model:
ClassificationPredicts F05 syllable window
Uses feature vector to predictobservation vector
observation vector: log(p),p
p = end of syllable pitch
Decision Tree
7/31/2019 Production of Emotional Speech
25/41
Model: Target
DurationSimilar to predicting F0
build tree with goal of providingGaussian at leafs
Use mean of class as target
duration
discretization
7/31/2019 Production of Emotional Speech
26/41
ModelsUses acoustic analogue of n-grams
captures sense of context
compared to describing full
emotion as sequencecompare to Affect Editor
Uses only F0 and length (comp.
A E)Include information about fromwhich utterance the featuresare derived
intentional bias, justified?
7/31/2019 Production of Emotional Speech
27/41
Model: SynthesisData tagged with originalexpression and emotion
expression-cost matrix
noted trade-off:
emotional intensity vs.smoothness
Paralinguistic events
7/31/2019 Production of Emotional Speech
28/41
SSML
Compare to Cahns typology
Abstraction layers
7/31/2019 Production of Emotional Speech
29/41
Perception
Experiment
Distinguish same utterancespoken with neutral andaffected prosody
Semantic content problematic?
7/31/2019 Production of Emotional Speech
30/41
Results
Binary decision
Reasonable gainover baseline?
7/31/2019 Production of Emotional Speech
31/41
Conclusion
Major contributions?
Paths forward?
S th i f E i
7/31/2019 Production of Emotional Speech
32/41
Synthesis of ExpressiveVisual Speech on a
Talking Head
7/31/2019 Production of Emotional Speech
33/41
< Not theseTalking Heads...
7/31/2019 Production of Emotional Speech
34/41
Synthesis
BackgroundManipulation of video images
Virtual model with deformationparametersSynchronized with time-alignedtranscription
Articulatory Control ModelCohen & Massaro (1993)
7/31/2019 Production of Emotional Speech
35/41
Data
Single actor
Given specific emotion asinstruction
6 emotions + neutral
7/31/2019 Production of Emotional Speech
36/41
Facial Animation
ParametersFace independent
FAP Matrix * scaling factor +position0
Weighted deformations of
distance between vertices andfeature point
7/31/2019 Production of Emotional Speech
37/41
Modeling
Phonetic segments assigned
target parameter vectortemporal blending overdominance functions
Principal components
7/31/2019 Production of Emotional Speech
38/41
ML
Separate models for each
emotion
6:1 training:testing ratio
models -> PC traj -> FAP traj *emotion param matrix
7/31/2019 Production of Emotional Speech
39/41
Results
More extreme emotions easierto perceive
73% sad, 60% angry, 40% sad
7/31/2019 Production of Emotional Speech
40/41
Synface Demo
7/31/2019 Production of Emotional Speech
41/41
Discussion
Changes in approach from Cahnto Eide
Production compared to
Detection