Speech Air

8/6/2019 Speech Air

1/33

Speech recognition,

understanding and conversationalinterfacesAlexander Rudnicky

School of Computer Sciencehttp://www.cs.cmu.edu/~air

8/6/2019 Speech Air

2/33

Outline

Speech

Types of speech interfacesSpeech systems and their structureDesigning speech interfaces

Some applications SpeechWear Communicator

8/6/2019 Speech Air

3/33

Speech as a signalThe difference between speech and sound CD quality vs. intelligible quality

high-quality is 44.1 / 48 kHzdesirable speech bandwidth: 0-8kHz, 16bits at 16bits/sample: 256kbps (tethered mic) telephone: 64kbps (and lower)

Compression: MPEG: 64kbps/channel and up (but not speech-optimal) CELP: 16kbps 2.4kbps (optimized for speech)

8/6/2019 Speech Air

4/33

Speech for communicationThe difference between speech andlanguageSpeech recognition and speechunderstanding

8/6/2019 Speech Air

5/33

Computers and speechTranscription dictation, information retrieval

Command and control data entry, device control, navigation

Information access airline schedules, stock quotes

Problem solving travel planning, logistics

8/6/2019 Speech Air

6/33

Speech system architectureSIGNAL PR O CESSINGDECO DINGUNDERSTANDINGDISCO URSE

ACTIO

N

8/6/2019 Speech Air

7/33

Varieties of speech systems

Transcription ommand &ontrol

Informationccess

roblemolving

I

O I X X X X

O I X X X X

TI X X

I O X

TIO X X X

8/6/2019 Speech Air

8/33

A generic speech system speech

Signal processing

Dialogmanager

Decoder

Parser LanguageGenerator

Speechsynthesizer

Post parser Domain

agent

Domain

agent

Domainagent

speech display effector

8/6/2019 Speech Air

9/33

8/6/2019 Speech Air

10/33

Creating models for recognition

A cousticmodels

Languagemodels

Speechdata

Textdata

Train

TrainTranscribe*

8/6/2019 Speech Air

11/33

Understanding speech

Parser

Post

parser

Extract semantic content from utterance

Introduce context and world knowledge into interpretation

G rammar

Context Domain

A gentsGrounding, knowledge engineering

O ntology design, language acquisition

8/6/2019 Speech Air

12/33

Interacting with the user

Dialogmanager

Domain

agent

Domain

agent

Domainagent

Guide interaction through task

Map user inputs and system state into actions

Interact with back-end(s)

Interpret information using domain knowledge

Task schemas

Database Live data(e.g. Web)

Domainexpert

Context

Task analysis

Knowledge engineering

8/6/2019 Speech Air

13/33

Communicating with the user

LanguageGenerator

Speechsynthesizer

DisplayGenerator

ActionGenerator

Decide what to say to user (and how to phrase it)

8/6/2019 Speech Air

14/33

Speech recognition andunderstanding

Sphinx system speaker-independent continuous speech large vocabulary

ATIS system air travel information retrieval context management

film clip

8/6/2019 Speech Air

15/33

Command and control systemsSmall vocabularies, fixed syntax O PEN WIND O W MOV E OBJ ECT to Applications:

data entry (e.g., zip codes), process control (e.g.,

electron microscope, darkroom equipment)Large vocabulary, fixed syntax Web browsing (?)

8/6/2019 Speech Air

16/33

SpeechWear V ehicle inspection task USMC mechanics, fixed inspection form Wearable computer (C O TS components) html-based task representation

film clip

8/6/2019 Speech Air

17/33

Information accessModerate to very large vocabulary IV R and frame based systems

Commercial systems: Nuance: http://www.nuance.com/demo/index.html SpeechWorks:

http://www.speechworks.com/demos/demos.htm lots of others..

8/6/2019 Speech Air

18/33

IV

R and frame-based systemsInteractive voice response (I V R) interactions specified by a graph (typically a

tree)

Frame systems ergodic graphs

states defined by multi-item forms

8/6/2019 Speech Air

19/33

Graph-based systemsWelcome to Bank A BC!

Please say one of the following:B alance, Hours, Loan, ...

What type of loan are you interested in? Please say one of the following:

Mortgage, Car, Personal, ...

. . . .

8/6/2019 Speech Air

20/33

Frame-based systemsI would like to fly to Boston I d like to go to Boston on Friday,

When would you like to fly?

Destination_City: BostonDeparture_Date: ______Departure_Time: ______Preferred_Airline: ______

..

.

8/6/2019 Speech Air

21/33

Frame-based systems

Zx fgdh_d x ab: _____

askjs: _____dhe: _____aa_hgjs_aa: _____..

Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..




Transition onkeyword or phrase

8/6/2019 Speech Air

22/33

Some problemsIV R systems work great, but only for well-structured ( shallow) tasksFrame systems are good for tasks thatcorrespond to a single form leading to anaction

Neither approach does well with morecomplex problem-solving activities

8/6/2019 Speech Air

23/33

Dialog SystemsProblem solving activity; complex task O rder of progression through task depends on

user goals (which can change) and system state(a back-end retrieval) and is not predictable.

Track progress and help task along

mixed-initiative dialogDiscourse phenomena User expect to converse with the system

8/6/2019 Speech Air

24/33

Carnegie Mellon Communicator A dialog system that supports complex

problem solving in a travel planning domain create an itinerary using air schedule, hotel and

car information 186 U.S. airports (>140k enplanements/yr)

currently: >500 world airportsWeb-based data resources Live and cached flight information Airport, airline, etc. information

8/6/2019 Speech Air

25/33

Value schema/handlers

valuetransformreceptors

DomainAgent

8/6/2019 Speech Air

26/33

Compound schema

valuetransform

V alue_3

V alue_1

V alue_2

DomainAgent

e.g. SQL query

+

8/6/2019 Speech Air

27/33

Schema orderingV alue i

V alue j

V alue k

Schema i

Schema j

Schema k

Destination airport

Date

Time Flight Leg

V alue

transform

Available flights

Database lookup

8/6/2019 Speech Air

28/33

8/6/2019 Speech Air

29/33

User-aware speech interfacesPredictable behavior on the systems partUsers coomunicate at different levelshttp://www.speech.cs.cmu.edu/air/papers/InterfaceChars.html

8/6/2019 Speech Air

30/33

User-aware speech interfacesContent: task-centric utterancesPossibility: What can I do?

O rientation: Where are we?Navigation: moving through the task space

Control: verbose/terse, listen!Customization: define this word

8/6/2019 Speech Air

31/33

Speech interface guidelinesSpeech recognition is errorfulSystem state is often opaque to the user http://www.speech.cs.cmu.edu/air/papers/S

pInGuidelines/SpInGuidelines.html

8/6/2019 Speech Air

32/33

Interface guidelinesState transparencyInput controlError recoveryError detection

Error correctionLog performanceApplication integration

8/6/2019 Speech Air

33/33

SummarySpeech and language communicationDialog structureInterface design

Date post:	07-Apr-2018
Category:	Documents
Upload:	mudit-misra
View:	222 times
Download:	0 times

Speech Air

Documents