Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | mudit-misra |
View: | 222 times |
Download: | 0 times |
of 33
8/6/2019 Speech Air
1/33
Speech recognition,
understanding and conversationalinterfacesAlexander Rudnicky
School of Computer Sciencehttp://www.cs.cmu.edu/~air
8/6/2019 Speech Air
2/33
Outline
Speech
Types of speech interfacesSpeech systems and their structureDesigning speech interfaces
Some applications SpeechWear Communicator
8/6/2019 Speech Air
3/33
Speech as a signalThe difference between speech and sound CD quality vs. intelligible quality
high-quality is 44.1 / 48 kHzdesirable speech bandwidth: 0-8kHz, 16bits at 16bits/sample: 256kbps (tethered mic) telephone: 64kbps (and lower)
Compression: MPEG: 64kbps/channel and up (but not speech-optimal) CELP: 16kbps 2.4kbps (optimized for speech)
8/6/2019 Speech Air
4/33
Speech for communicationThe difference between speech andlanguageSpeech recognition and speechunderstanding
8/6/2019 Speech Air
5/33
Computers and speechTranscription dictation, information retrieval
Command and control data entry, device control, navigation
Information access airline schedules, stock quotes
Problem solving travel planning, logistics
8/6/2019 Speech Air
6/33
Speech system architectureSIGNAL PR O CESSINGDECO DINGUNDERSTANDINGDISCO URSE
ACTIO
N
8/6/2019 Speech Air
7/33
Varieties of speech systems
Transcription ommand &ontrol
Informationccess
roblemolving
I
O I X X X X
O I X X X X
TI X X
I O X
TIO X X X
8/6/2019 Speech Air
8/33
A generic speech system speech
Signal processing
Dialogmanager
Decoder
Parser LanguageGenerator
Speechsynthesizer
Post parser Domain
agent
Domain
agent
Domainagent
speech display effector
8/6/2019 Speech Air
9/33
8/6/2019 Speech Air
10/33
Creating models for recognition
A cousticmodels
Languagemodels
Speechdata
Textdata
Train
TrainTranscribe*
8/6/2019 Speech Air
11/33
Understanding speech
Parser
Post
parser
Extract semantic content from utterance
Introduce context and world knowledge into interpretation
G rammar
Context Domain
A gentsGrounding, knowledge engineering
O ntology design, language acquisition
8/6/2019 Speech Air
12/33
Interacting with the user
Dialogmanager
Domain
agent
Domain
agent
Domainagent
Guide interaction through task
Map user inputs and system state into actions
Interact with back-end(s)
Interpret information using domain knowledge
Task schemas
Database Live data(e.g. Web)
Domainexpert
Context
Task analysis
Knowledge engineering
8/6/2019 Speech Air
13/33
Communicating with the user
LanguageGenerator
Speechsynthesizer
DisplayGenerator
ActionGenerator
Decide what to say to user (and how to phrase it)
8/6/2019 Speech Air
14/33
Speech recognition andunderstanding
Sphinx system speaker-independent continuous speech large vocabulary
ATIS system air travel information retrieval context management
film clip
8/6/2019 Speech Air
15/33
Command and control systemsSmall vocabularies, fixed syntax O PEN WIND O W MOV E OBJ ECT to Applications:
data entry (e.g., zip codes), process control (e.g.,
electron microscope, darkroom equipment)Large vocabulary, fixed syntax Web browsing (?)
8/6/2019 Speech Air
16/33
SpeechWear V ehicle inspection task USMC mechanics, fixed inspection form Wearable computer (C O TS components) html-based task representation
film clip
8/6/2019 Speech Air
17/33
Information accessModerate to very large vocabulary IV R and frame based systems
Commercial systems: Nuance: http://www.nuance.com/demo/index.html SpeechWorks:
http://www.speechworks.com/demos/demos.htm lots of others..
8/6/2019 Speech Air
18/33
IV
R and frame-based systemsInteractive voice response (I V R) interactions specified by a graph (typically a
tree)
Frame systems ergodic graphs
states defined by multi-item forms
8/6/2019 Speech Air
19/33
Graph-based systemsWelcome to Bank A BC!
Please say one of the following:B alance, Hours, Loan, ...
What type of loan are you interested in? Please say one of the following:
Mortgage, Car, Personal, ...
. . . .
8/6/2019 Speech Air
20/33
Frame-based systemsI would like to fly to Boston I d like to go to Boston on Friday,
When would you like to fly?
Destination_City: BostonDeparture_Date: ______Departure_Time: ______Preferred_Airline: ______
..
.
8/6/2019 Speech Air
21/33
Frame-based systems
Zx fgdh_d x ab: _____
askjs: _____dhe: _____aa_hgjs_aa: _____..
Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..
Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..
Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..
Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..
Transition onkeyword or phrase
8/6/2019 Speech Air
22/33
Some problemsIV R systems work great, but only for well-structured ( shallow) tasksFrame systems are good for tasks thatcorrespond to a single form leading to anaction
Neither approach does well with morecomplex problem-solving activities
8/6/2019 Speech Air
23/33
Dialog SystemsProblem solving activity; complex task O rder of progression through task depends on
user goals (which can change) and system state(a back-end retrieval) and is not predictable.
Track progress and help task along
mixed-initiative dialogDiscourse phenomena User expect to converse with the system
8/6/2019 Speech Air
24/33
Carnegie Mellon Communicator A dialog system that supports complex
problem solving in a travel planning domain create an itinerary using air schedule, hotel and
car information 186 U.S. airports (>140k enplanements/yr)
currently: >500 world airportsWeb-based data resources Live and cached flight information Airport, airline, etc. information
8/6/2019 Speech Air
25/33
Value schema/handlers
valuetransformreceptors
DomainAgent
8/6/2019 Speech Air
26/33
Compound schema
valuetransform
V alue_3
V alue_1
V alue_2
DomainAgent
e.g. SQL query
+
8/6/2019 Speech Air
27/33
Schema orderingV alue i
V alue j
V alue k
Schema i
Schema j
Schema k
Destination airport
Date
Time Flight Leg
V alue
transform
Available flights
Database lookup
8/6/2019 Speech Air
28/33
8/6/2019 Speech Air
29/33
User-aware speech interfacesPredictable behavior on the systems partUsers coomunicate at different levelshttp://www.speech.cs.cmu.edu/air/papers/InterfaceChars.html
8/6/2019 Speech Air
30/33
User-aware speech interfacesContent: task-centric utterancesPossibility: What can I do?
O rientation: Where are we?Navigation: moving through the task space
Control: verbose/terse, listen!Customization: define this word
8/6/2019 Speech Air
31/33
Speech interface guidelinesSpeech recognition is errorfulSystem state is often opaque to the user http://www.speech.cs.cmu.edu/air/papers/S
pInGuidelines/SpInGuidelines.html
8/6/2019 Speech Air
32/33
Interface guidelinesState transparencyInput controlError recoveryError detection
Error correctionLog performanceApplication integration
8/6/2019 Speech Air
33/33
SummarySpeech and language communicationDialog structureInterface design