+ All Categories
Home > Documents > High Level Prosody features: through the construction of a model for emotional speech Loic Kessous...

High Level Prosody features: through the construction of a model for emotional speech Loic Kessous...

Date post: 16-Dec-2015
Category:
Upload: lenard-barnett
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
High Level Prosody High Level Prosody features: through the features: through the construction of a model construction of a model for emotional speech for emotional speech Loic Kessous Loic Kessous Tel Aviv University Tel Aviv University Speech, Language and Hearing Speech, Language and Hearing [email protected] [email protected]
Transcript

High Level Prosody High Level Prosody features: through the features: through the

construction of a model for construction of a model for emotional speechemotional speech

Loic KessousLoic KessousTel Aviv UniversityTel Aviv University

Speech, Language and HearingSpeech, Language and [email protected]@post.tau.ac.il

Speech, Music and EmotionSpeech, Music and Emotion

Prosody can be defined as ‘Prosody can be defined as ‘the rhythmic the rhythmic and intonational aspect of language’and intonational aspect of language’. . One can also add lexical stress in speech to One can also add lexical stress in speech to rhythm and intonation. By definition, rhythm and intonation. By definition, 'Prosody in speech' and 'music' are closely 'Prosody in speech' and 'music' are closely related. One definition of music is related. One definition of music is ‘the art ‘the art or science of combining vocal or or science of combining vocal or instrumental sounds (or both) to instrumental sounds (or both) to produce beauty of form, harmony, and produce beauty of form, harmony, and expression of emotion’expression of emotion’..

Speech, music and emotionSpeech, music and emotion

If one consider then emotional speech, it If one consider then emotional speech, it seems obvious to search for relationship seems obvious to search for relationship between emotional speech and music.between emotional speech and music.

Defining the concept of a common Defining the concept of a common framework for music theory and linguistic framework for music theory and linguistic and a real common research approach…and a real common research approach…the road is not easy…the road is not easy…

Basis of music theory: the concept of pitch Basis of music theory: the concept of pitch intervals. intervals.

Representation of ProsodyRepresentation of Prosody

It needs to be related to perceptionIt needs to be related to perception In order to arrive at a better understanding In order to arrive at a better understanding

and modeling of prosody in emotional speechand modeling of prosody in emotional speech In order to extract pertinent featuresIn order to extract pertinent features

It should be as automatic as possible (It should be as automatic as possible ( for recognition, extraction of patterns and for recognition, extraction of patterns and

‘hidden structure’‘hidden structure’ ‘‘Reversible’ for expressive speech Reversible’ for expressive speech

synthesissynthesis

Example: Prosogram Example: Prosogram (P.Mertens) (P.Mertens)

Music-like (‘piano Music-like (‘piano roll’-like) visual roll’-like) visual representationrepresentation

Pitch stylization Pitch stylization based on glissando based on glissando perception and perception and nuclei segmentationnuclei segmentation

Uses manual or Uses manual or automatic automatic segmentationsegmentation

Allows pitch Allows pitch corrected files as corrected files as inputinput

150 Hz

_ ma s l a a e k i v _ l e t l i

_ Marcel Achard écrivait _ elle et très joliefg00150

0 1 2 3

60

70

80

90 G=0.32/T2 vnuclei

Prosogram v1.4.3

150 Hz

i l m m b l _ l e t e l e g t _ s

jolieelleest même belle _ elle est élégante _ on s'fg00150

3 4 5 6

60

70

80

90 G=0.32/T2 vnuclei

Prosogram v1.4.3

150 Hz

n a p s w a t u d s i t _ m l u b l i d k na g

en aperçoit toutde suite _ mais on l' oublie dès qu'on aregardéfg00150

6 7 8 9

60

70

80

90 G=0.32/T2 vnuclei

Prosogram v1.4.3

150 Hz

a de s e z j ø _

regardé ses yeux _fg00150

9 10 11 12

60

70

80

90 G=0.32/T2 vnuclei

Prosogram v1.4.3

Type of prosodic featuresType of prosodic features PitchPitch

Perceived pitch intervals between syllablesPerceived pitch intervals between syllables 'Glissando' presence, type and properties'Glissando' presence, type and properties

DurationDuration Length of syllables, length ratio/difference Length of syllables, length ratio/difference

between syllable, word lengthbetween syllable, word length Distance between syllables (pause length)Distance between syllables (pause length)

Energy:Energy: Word's energyWord's energy Ratio/difference of syllables energyRatio/difference of syllables energy

Analysis method for pitch Analysis method for pitch featuresfeatures

Syllable segmentationSyllable segmentation Glissando presence decision for each syllableGlissando presence decision for each syllable No glissando: calculation of a perceived pitch No glissando: calculation of a perceived pitch

valuevalue Glissando: pitch at end of syllable is Glissando: pitch at end of syllable is

consideredconsidered Others:Others:

Minima of stylized pitch, direction of glissando, Minima of stylized pitch, direction of glissando, range of glissando, etc...range of glissando, etc...

ExampleExample

Two syllable wordTwo syllable word CEICES databaseCEICES database Word 'Aibo'Word 'Aibo' ‘‘Expressiveness’Expressiveness’ User challenging the robotUser challenging the robot

Why this wordWhy this word??

No strong meaning that can be etymologically related to a No strong meaning that can be etymologically related to a specific emotionspecific emotion

grammatical role that can then give results more related to grammatical role that can then give results more related to linguistic than expressivenesslinguistic than expressiveness

Challenge the robot, so can be individually considered as a Challenge the robot, so can be individually considered as a complete and finite prosodic unit that doesn’t sound as complete and finite prosodic unit that doesn’t sound as ‘non-ended’.‘non-ended’.

Calling the robot by his name before to express something Calling the robot by his name before to express something to him can also be considered as a specificity of human-to him can also be considered as a specificity of human-robot interaction, and could eventually be imposed to the robot interaction, and could eventually be imposed to the user as a constant of a application or OS system.user as a constant of a application or OS system.

%of ‘Aibo’ inside label Nb of ‘Aibo’ Nb of words Label

9.69 245 2528 Emphatic

7.06 89 1260 Motherese

76.45 237 310 Reprimanding

21.33 48 225 Irritated (touchy)

2 101 Joyful

38.09 32 84 Angry

13.68 5359 39169 Neutral

Example - Pitch intervalsExample - Pitch intervalsMotherese

EmphaticNeutral

angry(CEICES database)

From 2-syllable words to From 2-syllable words to sentencessentences

More important couple of successive More important couple of successive pitch nuclei pitch nuclei

Analysis: Discovering ‘hidden Analysis: Discovering ‘hidden harmonic structure’ and patternsharmonic structure’ and patterns

Synthesis: rules for completion and Synthesis: rules for completion and global patternglobal pattern

Expressive synthesis Expressive synthesis Examples Using MbrolaExamples Using Mbrola

diphone concatenationdiphone concatenation PSOLA pitch transposition and time PSOLA pitch transposition and time

stretching (formant preservation)stretching (formant preservation) Possibilities:Possibilities: definition of phonemes definition of phonemes

duration, definition of pitch points and duration, definition of pitch points and linear interpolation between themlinear interpolation between them

Not possible: changing energy of each Not possible: changing energy of each diphone, changing voice qualitydiphone, changing voice quality


Recommended