+ All Categories
Home > Documents > 1 WP3 speech and emotion (analysis & recognition) human language technologies.

1 WP3 speech and emotion (analysis & recognition) human language technologies.

Date post: 31-Mar-2015
Category:
Upload: owen-hambly
View: 228 times
Download: 3 times
Share this document with a friend
Popular Tags:
26
1 WP3 speech and emotion (analysis & recognition) human language technolog ies
Transcript
Page 1: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

1

WP3 speech and emotion (analysis & recognition)

humanlanguage

technologies

Page 2: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

2

Databases and Annotations

Page 3: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

3

UERLN: SYMPAFLY

Fully automatic speech dialogue telephone system for flight reservation and booking, different system stages; 270 Dialogues.

• Annotations: word-based emotional user states, prosodic and conversational peculiarities; dialogue (step) success; emotional user states distribution follows nested Pareto (80/20) principle

Page 4: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

4

UERLN: AIBO

Children's interaction (age 10-12, 51 children, 9.2 hours of speech) with SONY’s AIBO robot, Wizard-of-Oz-scenario; cf. WP5 (plus English and read speech)

• Annotations: word-based emotional user states (holistic, 5 labellers) and prosodic peculiarities; alignment of children's utterances with AIBO's actions; manual correction of F0, labelling of voice quality. Emotional user states for the English data.

Page 5: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

5

AIBO disobedient: frommotherese to angry

g'radeaus Aibolein ja M fein M gut M machst M du M *da M | *tz l"aufst du mal bitte nach links | stopp E Aibo stopp | nach links E umdrehen | nein M <*ne> nein M <*ne> nein M <*ne> so M weit M *simma M noch M nicht M aufstehen M Schlafm"utze M komm M hoch M | ja M so M ist M es M <*is> guter M Hund M lauf mal jetzt nach links | nach links Aibo | Aibolein M aufstehen M *son M sonst M werd' M ich M b"ose M hoch E | nach A links A | Aibo A nach A links A | Aibolein A ganz A b"oser A Hund A jetzt A stehst A du A auf A | hoch A | dreh dich ein bisschen | ja M so ist es <*is> gut stopp Aibo stopp | *tz lauf g'radeaus |

Page 6: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

6

UERLN: Different Conceptualizations

Aibo straight on stop Aibo stop turn round to the left Aibo get up turn round to the left Aibo get up turn round, to the left Aibo get up get up Aibo now go left now straight on Aibo st´ straight on

Straight on little Aibo ok greatYou‘re doing fine now please to the left stop Aibo stop turn to the left no no no we aren´t thatfar yet get up sleepyhead get upyes that´s a good dog now goleft left Aibo little Aibo get upelse I´m getting angry get up Aibo left little Aibo bad boy now get up turn a little ok that´s fine stop Aibo stop straight on

Remote control tool Pet dog

Page 7: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

7

Fully automatic speech dialogue telephone system • 15,6 hours of Italian natural speech• 9444 files (turns) -> 450 emotionally rich

Word-level• Orthographic transcription and word segmentation• Prosodic peculiarities annotated

Turn-level• Holistic emotion labels

Sympafly (cf. UERLN) for comparison and benchmarking

ITC: Targhe

Page 8: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

8

UKA: LDC2002S28

Elicited emotional speech database; native American English

• labels: 1 of 15 holistic speaker states per utterance; used in algorithm and feature set development

Page 9: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

9

UKA: ISL Meeting Corpus

18 recordings of multi-party (mean 5.1 participants) meetings; mean 35 minute duration; American English

• Annotations: orthographic transcription; Verbmobil II, and discourse-level annotations.

Page 10: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

10

Assessment of Data Collection:

• focus on• spontaneous, realistic data• important/new types of dialogues/interaction• evaluation of annotations

• considerable percentage of realistic (processed and available) databases world-wide

Page 11: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

11

Features & Classification

Page 12: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

12

UERLN: Features

• large feature vector for a context of 2 words:• 95 prosodic (duration, energy, F0, pauses)• 80 spectral (HNR, formant based frequencies and energy)• 24 MFCC• 30 POS

• Language Models & dialogue based features

Page 13: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

13

Baseline feature set• 96 features• Based on energy, duration, and pitch

Final feature set• 273 features (many redundant)• Based on energy, duration, pitch, and pauses• Different pitch extractors tried

Normalized Cross CorrelationWeighted Auto CorrelationUERLN PDA

• Different subsets compared• Different tests to reduce the feature space

Principal component analysis

ITC: Features

Page 14: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

14

UKA: 133 Acoustic Features

• pitch, unvoiced/unvoiced energy, quartiles (15)• voice quality, Praat metrics (11)• harmonicity, quartiles (5) and Praat metrics (3)• zero-crossing rate vs energy, histogram (20)• correlation/regression, coefficients (36)• vocal tract volume, quartiles (25)• duration/timing, verbmobil features (18)

Page 15: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

15

Classifiers

UERLN: Linear Discriminant Analysis LDA, Decision Trees (CARTs), Neural Networks NN, Support Vector machines SVM, Gaussian Mixtures GM, Language Models LM

ITC: Decision Trees (CARTs), Neural Networks NN UKA: Linear, Neural Networks NN, Support Vector

machines SVM

Page 16: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

16

UERLN classification I: SympaFly

GM/NN, 2 classes, neutral vs. problem, l≠t

dialogue step success, 2 classes, SVM: CL 82.5dialogue success, 2 classes, CART: CL 85.4

combination CL RR

Pros.+MFCC: 74.4 74.2

HNR+Pros: 74.8 76.0

HNR+MFCC: 70.4 69.8

RR: overall rec. rateCL: class-wise averaged rec. rate

LDA, 4 classes

SVM/CART, 2 classes, loo

Page 17: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

17

UERLN classification II: AIBO

features CL

pros/POS 59.7

pros. /POS, opt. 63.2

MFCC, frames 45.4

MFCC, words 58.3

pros/POS + MFCC 65.3

4 classes "AMEN", NN joyful surprised motherese neutral (default) rest (non-neutral) bored helpless, hesitant emphatic touchy (=irritated) angry reprimanding

Page 18: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

18

Final feature set• 273 (acoustic/temporal) features• 2 class problem (neutral and non neutral)

ITC Classification II:

Classifier CART Neural Networks

Database Targhe Sympafly Targhe Sympafly

RR 73.2% 73.9% 74.2% 73.5%

CL 70.7% 72.1% 69.4% 74.1%

RR = overall rec. rate; CL = class-wise averaged rec. rateN = neutral turns; NN = Non neutral turns

Page 19: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

19

UKA Classification II:

133 utterance-level prosodic features, 15 classes,acted speech, 8 speakers:

Task Classifier Feat Selection CL

spk-indep linear none 19.0%

spk-indep linear spk-indep 21.3%

spk-indep linear spk-dep 31.3%

spk-dep linear none 38.7%

spk-dep SVM none 53.0%

Page 20: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

20

Assessment of Features

• a pool of many different features/feature groups implemented/compared• prosodic features better (more consistent) than "spectral" features in realistic speech• combination of knowledge sources improves performance• relevance of single features (feature classes)?

Page 21: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

21

Assessment of Classifications

• not much difference between different classifiers in classification performance (linear classifiers highly competitive in speaker-independent classification)• large differences between speaker-dependent and speaker-independent classification

Page 22: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

22

Categories & Dimensions

cf. also tomorrow

Page 23: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

23

UKA: Meeting Annotation

Meeting audio appears to be rich in non-neutral speech.

0

10

20

30

40

50

60

70

project work game discuss chat

Labeler 1

Labeler 2

Labeler 3

Open-set holistic labeling of 5 meetings by 3 labellers

Page 24: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

24

UKA: towards new Dimensions for Social Interaction in Meetings denoting conflict, bulding community, or skepticism etc.

IMAGE PROMOTION

self self group groupat expense of more than no bias more than at expense of

group group self self

resolve/strength

grateful

doubt/weakness insecure

ego-building conflict-diffusinggiving up

skeptical

demandingencouraging/comforting advocating

↕directing/leading

ignoring/interrupting collegial-conflicthostile-conflict

accedingcommunity-building

weak

pow

er

s

tron

g

self support group

Page 25: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

25

Assessment of Categories & Dimensions

New categories, new dimensions, new consistency measure

prototypical "full-blown" emotions are rare labels depend on type of data (call center, human-

robot, different types of multi-party meeting) new dimensions that do not model emotions but

interaction between participants in communication new entropy based consistency measure

Page 26: 1 WP3 speech and emotion (analysis & recognition) human language technologies.

26

Thak you for your attention


Recommended