+ All Categories
Home > Documents > Speech Air

Speech Air

Date post: 07-Apr-2018
Category:
Upload: mudit-misra
View: 222 times
Download: 0 times
Share this document with a friend

of 33

Transcript
  • 8/6/2019 Speech Air

    1/33

    Speech recognition,

    understanding and conversationalinterfacesAlexander Rudnicky

    School of Computer Sciencehttp://www.cs.cmu.edu/~air

  • 8/6/2019 Speech Air

    2/33

    Outline

    Speech

    Types of speech interfacesSpeech systems and their structureDesigning speech interfaces

    Some applications SpeechWear Communicator

  • 8/6/2019 Speech Air

    3/33

    Speech as a signalThe difference between speech and sound CD quality vs. intelligible quality

    high-quality is 44.1 / 48 kHzdesirable speech bandwidth: 0-8kHz, 16bits at 16bits/sample: 256kbps (tethered mic) telephone: 64kbps (and lower)

    Compression: MPEG: 64kbps/channel and up (but not speech-optimal) CELP: 16kbps 2.4kbps (optimized for speech)

  • 8/6/2019 Speech Air

    4/33

    Speech for communicationThe difference between speech andlanguageSpeech recognition and speechunderstanding

  • 8/6/2019 Speech Air

    5/33

    Computers and speechTranscription dictation, information retrieval

    Command and control data entry, device control, navigation

    Information access airline schedules, stock quotes

    Problem solving travel planning, logistics

  • 8/6/2019 Speech Air

    6/33

    Speech system architectureSIGNAL PR O CESSINGDECO DINGUNDERSTANDINGDISCO URSE

    ACTIO

    N

  • 8/6/2019 Speech Air

    7/33

    Varieties of speech systems

    Transcription ommand &ontrol

    Informationccess

    roblemolving

    I

    O I X X X X

    O I X X X X

    TI X X

    I O X

    TIO X X X

  • 8/6/2019 Speech Air

    8/33

    A generic speech system speech

    Signal processing

    Dialogmanager

    Decoder

    Parser LanguageGenerator

    Speechsynthesizer

    Post parser Domain

    agent

    Domain

    agent

    Domainagent

    speech display effector

  • 8/6/2019 Speech Air

    9/33

  • 8/6/2019 Speech Air

    10/33

    Creating models for recognition

    A cousticmodels

    Languagemodels

    Speechdata

    Textdata

    Train

    TrainTranscribe*

  • 8/6/2019 Speech Air

    11/33

    Understanding speech

    Parser

    Post

    parser

    Extract semantic content from utterance

    Introduce context and world knowledge into interpretation

    G rammar

    Context Domain

    A gentsGrounding, knowledge engineering

    O ntology design, language acquisition

  • 8/6/2019 Speech Air

    12/33

    Interacting with the user

    Dialogmanager

    Domain

    agent

    Domain

    agent

    Domainagent

    Guide interaction through task

    Map user inputs and system state into actions

    Interact with back-end(s)

    Interpret information using domain knowledge

    Task schemas

    Database Live data(e.g. Web)

    Domainexpert

    Context

    Task analysis

    Knowledge engineering

  • 8/6/2019 Speech Air

    13/33

    Communicating with the user

    LanguageGenerator

    Speechsynthesizer

    DisplayGenerator

    ActionGenerator

    Decide what to say to user (and how to phrase it)

  • 8/6/2019 Speech Air

    14/33

    Speech recognition andunderstanding

    Sphinx system speaker-independent continuous speech large vocabulary

    ATIS system air travel information retrieval context management

    film clip

  • 8/6/2019 Speech Air

    15/33

    Command and control systemsSmall vocabularies, fixed syntax O PEN WIND O W MOV E OBJ ECT to Applications:

    data entry (e.g., zip codes), process control (e.g.,

    electron microscope, darkroom equipment)Large vocabulary, fixed syntax Web browsing (?)

  • 8/6/2019 Speech Air

    16/33

    SpeechWear V ehicle inspection task USMC mechanics, fixed inspection form Wearable computer (C O TS components) html-based task representation

    film clip

  • 8/6/2019 Speech Air

    17/33

    Information accessModerate to very large vocabulary IV R and frame based systems

    Commercial systems: Nuance: http://www.nuance.com/demo/index.html SpeechWorks:

    http://www.speechworks.com/demos/demos.htm lots of others..

  • 8/6/2019 Speech Air

    18/33

    IV

    R and frame-based systemsInteractive voice response (I V R) interactions specified by a graph (typically a

    tree)

    Frame systems ergodic graphs

    states defined by multi-item forms

  • 8/6/2019 Speech Air

    19/33

    Graph-based systemsWelcome to Bank A BC!

    Please say one of the following:B alance, Hours, Loan, ...

    What type of loan are you interested in? Please say one of the following:

    Mortgage, Car, Personal, ...

    . . . .

  • 8/6/2019 Speech Air

    20/33

    Frame-based systemsI would like to fly to Boston I d like to go to Boston on Friday,

    When would you like to fly?

    Destination_City: BostonDeparture_Date: ______Departure_Time: ______Preferred_Airline: ______

    ..

    .

  • 8/6/2019 Speech Air

    21/33

    Frame-based systems

    Zx fgdh_d x ab: _____

    askjs: _____dhe: _____aa_hgjs_aa: _____..

    Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..

    Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..

    Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..

    Zx fgdh_d x ab: _____askjs: _____dhe: _____aa_hgjs_aa: _____..

    Transition onkeyword or phrase

  • 8/6/2019 Speech Air

    22/33

    Some problemsIV R systems work great, but only for well-structured ( shallow) tasksFrame systems are good for tasks thatcorrespond to a single form leading to anaction

    Neither approach does well with morecomplex problem-solving activities

  • 8/6/2019 Speech Air

    23/33

    Dialog SystemsProblem solving activity; complex task O rder of progression through task depends on

    user goals (which can change) and system state(a back-end retrieval) and is not predictable.

    Track progress and help task along

    mixed-initiative dialogDiscourse phenomena User expect to converse with the system

  • 8/6/2019 Speech Air

    24/33

    Carnegie Mellon Communicator A dialog system that supports complex

    problem solving in a travel planning domain create an itinerary using air schedule, hotel and

    car information 186 U.S. airports (>140k enplanements/yr)

    currently: >500 world airportsWeb-based data resources Live and cached flight information Airport, airline, etc. information

  • 8/6/2019 Speech Air

    25/33

    Value schema/handlers

    valuetransformreceptors

    DomainAgent

  • 8/6/2019 Speech Air

    26/33

    Compound schema

    valuetransform

    V alue_3

    V alue_1

    V alue_2

    DomainAgent

    e.g. SQL query

    +

  • 8/6/2019 Speech Air

    27/33

    Schema orderingV alue i

    V alue j

    V alue k

    Schema i

    Schema j

    Schema k

    Destination airport

    Date

    Time Flight Leg

    V alue

    transform

    Available flights

    Database lookup

  • 8/6/2019 Speech Air

    28/33

  • 8/6/2019 Speech Air

    29/33

    User-aware speech interfacesPredictable behavior on the systems partUsers coomunicate at different levelshttp://www.speech.cs.cmu.edu/air/papers/InterfaceChars.html

  • 8/6/2019 Speech Air

    30/33

    User-aware speech interfacesContent: task-centric utterancesPossibility: What can I do?

    O rientation: Where are we?Navigation: moving through the task space

    Control: verbose/terse, listen!Customization: define this word

  • 8/6/2019 Speech Air

    31/33

    Speech interface guidelinesSpeech recognition is errorfulSystem state is often opaque to the user http://www.speech.cs.cmu.edu/air/papers/S

    pInGuidelines/SpInGuidelines.html

  • 8/6/2019 Speech Air

    32/33

    Interface guidelinesState transparencyInput controlError recoveryError detection

    Error correctionLog performanceApplication integration

  • 8/6/2019 Speech Air

    33/33

    SummarySpeech and language communicationDialog structureInterface design


Recommended