+ All Categories
Home > Documents > May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.

May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.

Date post: 30-Dec-2015
Category:
Upload: antony-atkins
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
May 2006 CLINT-CS Verbmobil 1 CLINT-CS Dialogue II Verbmobil
Transcript

May 2006 CLINT-CS Verbmobil 1

CLINT-CS

Dialogue II

Verbmobil

May 2006 CLINT-CS Verbmobil 2

Verbmobil

• Verbmobil is a spoken dialogue system that provides phone users with simultaneous dialogue interpretation services for restricted topics.

• Recognises spoken input, translates it, and then utters the translation.

• Three languages: German, English and Japanese

May 2006 CLINT-CS Verbmobil 3

Challenges for S and L Technology

Input Conditions

Naturalness Adaptability Dialogue Capabilities

Close speaking, PTT

Isolated words Speaker dependent

Monologue dictation

Telephone, pause based segmentation

Read continuous speech

Speaker independent

Information seeking dialogue

Open microphone, GSM quality

Spontaneous speech

Speaker adaptive

Multiparty negotiation

Incr

easi

ng d

iffic

ulty

May 2006 CLINT-CS Verbmobil 4

Grand Challenges

• Not a push-to-talk system. Has to decide for itself when user input is complete.

• Spontaneous speech including disfluencies and repair phenomena.

• Speaker adaptive.• Mixed initiative dialogue• Three different domains of discourse

May 2006 CLINT-CS Verbmobil 5

Domains

Scenario 1

Appointment

Scheduling

Scenario 2

Travel Planning

Scenario 3

Remote PC Maintenance

When?

Focus on temporal expressions

Vocabulary 2.5-6K

When? Where? How?

Focus on Temporal and spatial expresssions

Vocabulary 7-10K

What? When? Wherer? How?

Focus on integration of special sublanguage lexica

Vocabulary 15-30K

May 2006 CLINT-CS Verbmobil 6

Data Collection

Transliteratedspeech data

Segmented speech with prosodic labels

Dialogues annotatedwith dialogue acts

Treebanks& predicateargument structures

Aligned bilingualCorpora

A signficant programme of data collection was performedTo extract statistical properties of different kinds of data

May 2006 CLINT-CS Verbmobil 7

Speech Data

• Multi channel recording– close-speaking microphone– room microphone– various telephones

• Speech recognisers trained on data sets of different audio quality

May 2006 CLINT-CS Verbmobil 8

Multi Level Data Annotation

• Speech Data– Transliteration– Orthography– Pronunciation– Phonological Segmentation– Word Segmentation– Prosodic Segmentation

• Non Speech– Dialogue Acts– Treebanks

May 2006 CLINT-CS Verbmobil 9

Statistical Models

• Data used to train different statistical models using Machine Learning.

• Models include– Neural Networks– Probabilistic Automata (HMMs for speech)– Probabilistic CFGs (robust parsing)– Probabilistic Transfer Rules

May 2006 CLINT-CS Verbmobil 10

May 2006 CLINT-CS Verbmobil 11

Architecture

• Different input devices (microphone, telephone, mobile, internet)

• Multilingual speech recognition (EN, DE, JP) including prosodic analysis

• Parsing

• Multi-level translation

• Multi-lingual generation

May 2006 CLINT-CS Verbmobil 12

Multi Engine Parsing Architecture

• Three different parsing models are employed– Probabilistic LR Parser– Robust Chunk Parsing– HPSG Chart Parser

• All parsing models produce trees that are tranformed into the same multistratal representation called VIT (Verbmobil Interface Terms)

• This facilitates integration of partial results from the different parsing models

May 2006 CLINT-CS Verbmobil 13

Translation Models

• Substring Based

• Template Based

• Dialogue Act Based

May 2006 CLINT-CS Verbmobil 14

Substring Based Translation

• Starts with the best sentence hypothesis of the speech recogniser

• Uses prosodic information to determine phrase boundaries and sentence mode

• Machine Learning methods applied to a sentence-aligned bilingual corpus

• The output of this module is a sequence of words in the target language together with a confidence measure that is used for selecting the best translation.

May 2006 CLINT-CS Verbmobil 15

Template Based Translation

• Based on 30K translation templates learned from a sentence-aligned corpus

Ti = (Tis,Ti

t){x1,..,xn}

• 3 phases:– SL Template matching– Subphrase Translation– TL utterance generation

May 2006 CLINT-CS Verbmobil 16

Template Translation Results

WL Best Hypothesis

All Word Lattice

Perfect Translation 47% 67%

Approx. Correct 16% 6%

Bad Translation 15% 5%

No Translation 22% 22%

May 2006 CLINT-CS Verbmobil 17

Multi Engine TranslationSegment 1If you prefer another hotel

Segment 2please let me know

case basedtranslation

substring basedtranslation

selection module

statisticaltranslation

dialogue basedtranslation

semantictransfer

Segment 1Semantic Xfer

Segment 2CBT

May 2006 CLINT-CS Verbmobil 18

Dialogue Act Based Translation

• Meaning based translation• Statistical classification of 19 dialogue acts.• Extraction of propositional content using finite

state transducers.• Content built from an ontology covering

appointment scheduling and travel planning tasks.

• Template based approach to generation of target language from content.

May 2006 CLINT-CS Verbmobil 19

Part of Ontology for Propositional Content

top

object situation quality

agent location

event actionabstract concrete

move-by-rail move-by-plane

move by public transport

journey move stay show meeting

May 2006 CLINT-CS Verbmobil 20

Dialogue Act Hierarchy

deliberatethankintroducebyegreet

control dialogue

promote task

manage task

DialogueAct

request suggestrequest clarifyrequest commentrequest commit

digressexcludeclarifyjustify

requestsuggestinformfeedbackcommitoffer

initdeferclose

May 2006 CLINT-CS Verbmobil 21

Dialogue –Based Translation:Transfer Component rules

Semantic RepresentationSource Language VIT

Semantic RepresentationTarget Language VIT

Dia

logu

e an

dco

ntex

t ev

alua

tion

GENERATION

May 2006 CLINT-CS Verbmobil 22

Prosody

• Input– Speech signal– Word Hypothesis Graph (WHG)

• Output– annotated WHG including, per word– duration, pitch, energy, pause info

• Used to classify phrase and clause boudaries, accented words, and sentence mood.

May 2006 CLINT-CS Verbmobil 23

Prosody – Sentence Mood

row? morYou are coming to

You are coming to mor ro w.

time

pitch

May 2006 CLINT-CS Verbmobil 24

Use of Prosodic Information

• Prosodic information is used systematically at all processing stages

• Prosodic difference can lead to different translation… wir haben noch (we still have vs. we have another)

May 2006 CLINT-CS Verbmobil 25

Multi Blackboard Architecture

• Final system comprises 69 highly interactive modules.

• No direct communication between modules.• Communication is handled by 198

blackboards.• Shared representation structures• A module typically subscribes to several

blackboards.

May 2006 CLINT-CS Verbmobil 26

Blackboards & Modules

command recogniser

generationrobust dialogue

semantics

semantic construction

spontaneous speechrecogniser

speakeradaptation

prosodic analysis

chunk parser

HPSG parser

semantictransfer

statissticalparser

dialogue actrecognition

Audio Data

WHG withprosodic labels

VIT discourserepresentation

May 2006 CLINT-CS Verbmobil 27

Multi Engine Approach

statisticalparser

chunk parser

HPSGparser

robust dialogue semanticKBased reconstruction

complete and spanning VIT

chart containingpartial VITs

AugmentedWHG

May 2006 CLINT-CS Verbmobil 28

Achievements

• 3 language pairs, three domains and a vocalbulary size of over 100K word forms

• Average processing time 4x original signal duration

• Word recognition rate of 75% for spontaneous speech

• 80% approximately correct translations• 90% success rate for dialogue tasks in end-

to-end evaluation

May 2006 CLINT-CS Verbmobil 29

Conclusion

• Speech to speech translation of spontaneous dialogues can only be cracked by combining deep and shallow processing

• The final architecture maximises the necessary interaction between processing modules

• Software engineering considerations must be taken seriously in such a project.


Recommended