© Macquarie University 2004 1
Special Lecture (406)Special Lecture (406)Special Lecture (406)Special Lecture (406)Spoken Language Dialog SystemsSpoken Language Dialog SystemsSpoken Language Dialog SystemsSpoken Language Dialog Systems
ReviewReviewReviewReview
Rolf Rolf Rolf Rolf SchwitterSchwitterSchwitterSchwitter
{{{{schwitt}@ics.mq.edu.auschwitt}@ics.mq.edu.auschwitt}@ics.mq.edu.auschwitt}@ics.mq.edu.au
© Macquarie University 2004 2
Why Voice?Why Voice?Why Voice?Why Voice?
• Wireless devices have small screens and limited input capabilitiWireless devices have small screens and limited input capabilitiWireless devices have small screens and limited input capabilitiWireless devices have small screens and limited input capabilities.es.es.es.
• Telephone keypad can give users only a limited number of choicesTelephone keypad can give users only a limited number of choicesTelephone keypad can give users only a limited number of choicesTelephone keypad can give users only a limited number of choices....
• Speech technology is evolving.Speech technology is evolving.Speech technology is evolving.Speech technology is evolving.
• The exchange of information between a person and a computer is The exchange of information between a person and a computer is The exchange of information between a person and a computer is The exchange of information between a person and a computer is becoming more like a becoming more like a becoming more like a becoming more like a real conversationreal conversationreal conversationreal conversation. . . .
• Users want handsUsers want handsUsers want handsUsers want hands----free or eyesfree or eyesfree or eyesfree or eyes----free use.free use.free use.free use.
• Allows interaction with displayAllows interaction with displayAllows interaction with displayAllows interaction with display----based Web content in cases where based Web content in cases where based Web content in cases where based Web content in cases where the mouse and keyboard may be missing or inconvenient. the mouse and keyboard may be missing or inconvenient. the mouse and keyboard may be missing or inconvenient. the mouse and keyboard may be missing or inconvenient.
© Macquarie University 2004 3
What is a Spoken Language Dialog System?What is a Spoken Language Dialog System?What is a Spoken Language Dialog System?What is a Spoken Language Dialog System?
• An SLDS is a computer system that you can talk to in order to caAn SLDS is a computer system that you can talk to in order to caAn SLDS is a computer system that you can talk to in order to caAn SLDS is a computer system that you can talk to in order to carry rry rry rry out some task.out some task.out some task.out some task.
• SLDSsSLDSsSLDSsSLDSs are typically of two kinds:are typically of two kinds:are typically of two kinds:are typically of two kinds:
– InformationInformationInformationInformation----provisionprovisionprovisionprovision systems provide information in response systems provide information in response systems provide information in response systems provide information in response to a query, such as a request for timetable information or to a query, such as a request for timetable information or to a query, such as a request for timetable information or to a query, such as a request for timetable information or weather information.weather information.weather information.weather information.
– TransactionTransactionTransactionTransaction----basedbasedbasedbased systems allow you to undertake some systems allow you to undertake some systems allow you to undertake some systems allow you to undertake some transaction, such as buying or selling stocks, or reserving transaction, such as buying or selling stocks, or reserving transaction, such as buying or selling stocks, or reserving transaction, such as buying or selling stocks, or reserving a seat on a plane.a seat on a plane.a seat on a plane.a seat on a plane.
© Macquarie University 2004 4
The Architecture of an SLDSThe Architecture of an SLDSThe Architecture of an SLDSThe Architecture of an SLDS
Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition
Language UnderstandingLanguage UnderstandingLanguage UnderstandingLanguage Understanding
Dialog ManagementDialog ManagementDialog ManagementDialog Management
DatabaseDatabaseDatabaseDatabase
Language GenerationLanguage GenerationLanguage GenerationLanguage Generation
Speech SynthesisSpeech SynthesisSpeech SynthesisSpeech Synthesis
© Macquarie University 2004 5
What's Involved in Building an SLDS?What's Involved in Building an SLDS?What's Involved in Building an SLDS?What's Involved in Building an SLDS?
• Dialog DesignDialog DesignDialog DesignDialog Design
– The process of working out how the interaction between human The process of working out how the interaction between human The process of working out how the interaction between human The process of working out how the interaction between human and machine will move from stage to stage.and machine will move from stage to stage.and machine will move from stage to stage.and machine will move from stage to stage.
– Also referred to as Also referred to as Also referred to as Also referred to as script writingscript writingscript writingscript writing or or or or call flow layout.call flow layout.call flow layout.call flow layout.
• Prompt DesignPrompt DesignPrompt DesignPrompt Design
– What the system says to get the caller to say something we are What the system says to get the caller to say something we are What the system says to get the caller to say something we are What the system says to get the caller to say something we are able to handle.able to handle.able to handle.able to handle.
• Grammar WritingGrammar WritingGrammar WritingGrammar Writing
– Specifying what the caller is permitted to say at any given statSpecifying what the caller is permitted to say at any given statSpecifying what the caller is permitted to say at any given statSpecifying what the caller is permitted to say at any given state.e.e.e.
© Macquarie University 2004 6
CSLU Speech ToolkitCSLU Speech ToolkitCSLU Speech ToolkitCSLU Speech Toolkit
• Developed at the Centre for Spoken Language Understanding at theDeveloped at the Centre for Spoken Language Understanding at theDeveloped at the Centre for Spoken Language Understanding at theDeveloped at the Centre for Spoken Language Understanding at theOregon Graduate Institute in the USA.Oregon Graduate Institute in the USA.Oregon Graduate Institute in the USA.Oregon Graduate Institute in the USA.
• The CSLU toolkit has many features that are similar to commerciaThe CSLU toolkit has many features that are similar to commerciaThe CSLU toolkit has many features that are similar to commerciaThe CSLU toolkit has many features that are similar to commercial l l l tools for building tools for building tools for building tools for building SLDSsSLDSsSLDSsSLDSs....
• The toolkit includes a rapid application development environmentThe toolkit includes a rapid application development environmentThe toolkit includes a rapid application development environmentThe toolkit includes a rapid application development environment....
• A tutorial is available at:A tutorial is available at:A tutorial is available at:A tutorial is available at:
http://www.cslu.ogi.edu/toolkit/docs/2.0/apps/rad/index.htmlhttp://www.cslu.ogi.edu/toolkit/docs/2.0/apps/rad/index.htmlhttp://www.cslu.ogi.edu/toolkit/docs/2.0/apps/rad/index.htmlhttp://www.cslu.ogi.edu/toolkit/docs/2.0/apps/rad/index.html
© Macquarie University 2004 7
Food Ordering ServiceFood Ordering ServiceFood Ordering ServiceFood Ordering Service
© Macquarie University 2004 8
Food Ordering ServiceFood Ordering ServiceFood Ordering ServiceFood Ordering Service
© Macquarie University 2004 9
Building a Pizza Ordering ServiceBuilding a Pizza Ordering ServiceBuilding a Pizza Ordering ServiceBuilding a Pizza Ordering Service
• SpecificationSpecificationSpecificationSpecification
– Pizzas are three sizes: small, medium, and large.Pizzas are three sizes: small, medium, and large.Pizzas are three sizes: small, medium, and large.Pizzas are three sizes: small, medium, and large.
– Pizzas may have one or more of these toppings:Pizzas may have one or more of these toppings:Pizzas may have one or more of these toppings:Pizzas may have one or more of these toppings:
cheese, pepperoni, sausage, peppers, pineapple, and tomatoescheese, pepperoni, sausage, peppers, pineapple, and tomatoescheese, pepperoni, sausage, peppers, pineapple, and tomatoescheese, pepperoni, sausage, peppers, pineapple, and tomatoes
– Drinks are three sizes: small, medium, and large.Drinks are three sizes: small, medium, and large.Drinks are three sizes: small, medium, and large.Drinks are three sizes: small, medium, and large.
– Drinks may be Coke, Pepsi, Diet Pepsi, lemonade, or water.Drinks may be Coke, Pepsi, Diet Pepsi, lemonade, or water.Drinks may be Coke, Pepsi, Diet Pepsi, lemonade, or water.Drinks may be Coke, Pepsi, Diet Pepsi, lemonade, or water.
– An order may contain zero or more pizzas and zero or more drinksAn order may contain zero or more pizzas and zero or more drinksAn order may contain zero or more pizzas and zero or more drinksAn order may contain zero or more pizzas and zero or more drinks....
© Macquarie University 2004 10
Steps in Dialog DesignSteps in Dialog DesignSteps in Dialog DesignSteps in Dialog Design
1.1.1.1. Make sure you understand what you are trying to achieve Make sure you understand what you are trying to achieve Make sure you understand what you are trying to achieve Make sure you understand what you are trying to achieve (use scenarios and build first a conceptual model).(use scenarios and build first a conceptual model).(use scenarios and build first a conceptual model).(use scenarios and build first a conceptual model).
2.2.2.2. See if you can decompose the task into smaller meaningful subtasSee if you can decompose the task into smaller meaningful subtasSee if you can decompose the task into smaller meaningful subtasSee if you can decompose the task into smaller meaningful subtasks.ks.ks.ks.
3.3.3.3. Identify the information tokens you need for each task or subtasIdentify the information tokens you need for each task or subtasIdentify the information tokens you need for each task or subtasIdentify the information tokens you need for each task or subtask.k.k.k.
4.4.4.4. Decide how you will obtain this information from the caller.Decide how you will obtain this information from the caller.Decide how you will obtain this information from the caller.Decide how you will obtain this information from the caller.
5.5.5.5. Sketch a call flow diagram with appropriate prompts that captureSketch a call flow diagram with appropriate prompts that captureSketch a call flow diagram with appropriate prompts that captureSketch a call flow diagram with appropriate prompts that captures s s s this information.this information.this information.this information.
6.6.6.6. Test your call flow diagram in a Wizard of Oz simulation.Test your call flow diagram in a Wizard of Oz simulation.Test your call flow diagram in a Wizard of Oz simulation.Test your call flow diagram in a Wizard of Oz simulation.
7.7.7.7. Revise your call flow diagram and repeat Step 6 …Revise your call flow diagram and repeat Step 6 …Revise your call flow diagram and repeat Step 6 …Revise your call flow diagram and repeat Step 6 …
© Macquarie University 2004 11
Wizard of Oz SimulationsWizard of Oz SimulationsWizard of Oz SimulationsWizard of Oz Simulations
• A human experimenter (the Wizard) simulates an automated system.A human experimenter (the Wizard) simulates an automated system.A human experimenter (the Wizard) simulates an automated system.A human experimenter (the Wizard) simulates an automated system.
• The dialog specifications (for example, flow charts with associaThe dialog specifications (for example, flow charts with associaThe dialog specifications (for example, flow charts with associaThe dialog specifications (for example, flow charts with associated ted ted ted prompts) are spread out in front of the experimenter. prompts) are spread out in front of the experimenter. prompts) are spread out in front of the experimenter. prompts) are spread out in front of the experimenter.
• The experimenter reads the appropriate prompts from the specs, The experimenter reads the appropriate prompts from the specs, The experimenter reads the appropriate prompts from the specs, The experimenter reads the appropriate prompts from the specs, waits for a response from the subject (or no response), checks twaits for a response from the subject (or no response), checks twaits for a response from the subject (or no response), checks twaits for a response from the subject (or no response), checks the he he he specs on how to proceed, and then speaks the next prompt.specs on how to proceed, and then speaks the next prompt.specs on how to proceed, and then speaks the next prompt.specs on how to proceed, and then speaks the next prompt.
• Very effective in uncovering problems with logic, navigation, awVery effective in uncovering problems with logic, navigation, awVery effective in uncovering problems with logic, navigation, awVery effective in uncovering problems with logic, navigation, awkward kward kward kward sequences of prompts, omissions, and so on.sequences of prompts, omissions, and so on.sequences of prompts, omissions, and so on.sequences of prompts, omissions, and so on.
© Macquarie University 2004 12
Types of Spoken OutputTypes of Spoken OutputTypes of Spoken OutputTypes of Spoken Output
TypeTypeTypeType FunctionFunctionFunctionFunction
PromptPromptPromptPrompt Indicates it is time for user input, and thus serves as aIndicates it is time for user input, and thus serves as aIndicates it is time for user input, and thus serves as aIndicates it is time for user input, and thus serves as aturn-taking cueturn-taking cueturn-taking cueturn-taking cue
FeedbackFeedbackFeedbackFeedback Presents the app state that results from user input,Presents the app state that results from user input,Presents the app state that results from user input,Presents the app state that results from user input,allowing the user to compare original intent with resultallowing the user to compare original intent with resultallowing the user to compare original intent with resultallowing the user to compare original intent with result
InstructionsInstructionsInstructionsInstructions Provide information about operating the user interface orProvide information about operating the user interface orProvide information about operating the user interface orProvide information about operating the user interface orunderstanding the taskunderstanding the taskunderstanding the taskunderstanding the task
HelpHelpHelpHelp Help instructions often adopt a separate mode or stateHelp instructions often adopt a separate mode or stateHelp instructions often adopt a separate mode or stateHelp instructions often adopt a separate mode or stateaimed at coaching the useraimed at coaching the useraimed at coaching the useraimed at coaching the user
App DataApp DataApp DataApp Data Information presented to the user as part of the task: egInformation presented to the user as part of the task: egInformation presented to the user as part of the task: egInformation presented to the user as part of the task: egweather, stock information, flight timesweather, stock information, flight timesweather, stock information, flight timesweather, stock information, flight times
© Macquarie University 2004 13
PromptsPromptsPromptsPrompts
• Prompts are the turnPrompts are the turnPrompts are the turnPrompts are the turn----taking cues within spoken dialogs.taking cues within spoken dialogs.taking cues within spoken dialogs.taking cues within spoken dialogs.
• Prompts have two purposes:Prompts have two purposes:Prompts have two purposes:Prompts have two purposes:
– cause the user to speak,cause the user to speak,cause the user to speak,cause the user to speak,
– convey to the user what may be spoken (optionally).convey to the user what may be spoken (optionally).convey to the user what may be spoken (optionally).convey to the user what may be spoken (optionally).
• Prompts should be as distinguishable as possible Prompts should be as distinguishable as possible Prompts should be as distinguishable as possible Prompts should be as distinguishable as possible fromfromfromfrom
– instructionsinstructionsinstructionsinstructions
– ffffeedbackeedbackeedbackeedback, and, and, and, and
– hhhhelpelpelpelp....
• Prompts fall along a continuum from implicit to explicit.Prompts fall along a continuum from implicit to explicit.Prompts fall along a continuum from implicit to explicit.Prompts fall along a continuum from implicit to explicit.
© Macquarie University 2004 14
Implicit versus Explicit PromptsImplicit versus Explicit PromptsImplicit versus Explicit PromptsImplicit versus Explicit Prompts
• Computer 1: Computer 1: Computer 1: Computer 1: Welcome to ABC Bank. What would you like to do?Welcome to ABC Bank. What would you like to do?Welcome to ABC Bank. What would you like to do?Welcome to ABC Bank. What would you like to do?
• Computer 2: Computer 2: Computer 2: Computer 2: Welcome to ABC Bank. You can check an account Welcome to ABC Bank. You can check an account Welcome to ABC Bank. You can check an account Welcome to ABC Bank. You can check an account balance, transfer funds, or pay a bill. What would you balance, transfer funds, or pay a bill. What would you balance, transfer funds, or pay a bill. What would you balance, transfer funds, or pay a bill. What would you like to do?like to do?like to do?like to do?
• Computer 3: Computer 3: Computer 3: Computer 3: Welcome to ABC Bank. You can check an account Welcome to ABC Bank. You can check an account Welcome to ABC Bank. You can check an account Welcome to ABC Bank. You can check an account balance, transfer funds, or pay a bill. Say one of the balance, transfer funds, or pay a bill. Say one of the balance, transfer funds, or pay a bill. Say one of the balance, transfer funds, or pay a bill. Say one of the following choices: check balance, transfer funds, or following choices: check balance, transfer funds, or following choices: check balance, transfer funds, or following choices: check balance, transfer funds, or pay bills.pay bills.pay bills.pay bills.
© Macquarie University 2004 15
Synthesized versus Digitized Spoken OutputSynthesized versus Digitized Spoken OutputSynthesized versus Digitized Spoken OutputSynthesized versus Digitized Spoken Output
• Use digitized speech whenever possible.Use digitized speech whenever possible.Use digitized speech whenever possible.Use digitized speech whenever possible.
• Use synthesized speech for unbounded information:Use synthesized speech for unbounded information:Use synthesized speech for unbounded information:Use synthesized speech for unbounded information:
– yellow and white pagesyellow and white pagesyellow and white pagesyellow and white pages
– encyclopedia readencyclopedia readencyclopedia readencyclopedia read----backbackbackback
– rapidly changing informationrapidly changing informationrapidly changing informationrapidly changing information
– electronic mail.electronic mail.electronic mail.electronic mail.
• Avoid synthesis for single isolated words.Avoid synthesis for single isolated words.Avoid synthesis for single isolated words.Avoid synthesis for single isolated words.
© Macquarie University 2004 16
TimingTimingTimingTiming
• MinimiseMinimiseMinimiseMinimise time between end of prompt and beginning of recognition.time between end of prompt and beginning of recognition.time between end of prompt and beginning of recognition.time between end of prompt and beginning of recognition.
• Trim prompts aggressively if bargeTrim prompts aggressively if bargeTrim prompts aggressively if bargeTrim prompts aggressively if barge----in is not in use.in is not in use.in is not in use.in is not in use.
• Communicate errors quickly.Communicate errors quickly.Communicate errors quickly.Communicate errors quickly.
© Macquarie University 2004 17
Prompt DesignPrompt DesignPrompt DesignPrompt Design
• Precede prompts with instructions.Precede prompts with instructions.Precede prompts with instructions.Precede prompts with instructions.
Computer:Computer:Computer:Computer: Your plan requires that you select a PIN to use the Your plan requires that you select a PIN to use the Your plan requires that you select a PIN to use the Your plan requires that you select a PIN to use the system. The PIN must be between 5 and 9 digits in system. The PIN must be between 5 and 9 digits in system. The PIN must be between 5 and 9 digits in system. The PIN must be between 5 and 9 digits in length. At the tone, please say your PIN.length. At the tone, please say your PIN.length. At the tone, please say your PIN.length. At the tone, please say your PIN.
• Repeat only the prompt.Repeat only the prompt.Repeat only the prompt.Repeat only the prompt.
Computer:Computer:Computer:Computer: Sorry, I didn’t understand that. At the tone, please Sorry, I didn’t understand that. At the tone, please Sorry, I didn’t understand that. At the tone, please Sorry, I didn’t understand that. At the tone, please say your PIN.say your PIN.say your PIN.say your PIN.
© Macquarie University 2004 18
Prompt DesignPrompt DesignPrompt DesignPrompt Design
• Put key information immediately before expected user speech:Put key information immediately before expected user speech:Put key information immediately before expected user speech:Put key information immediately before expected user speech:
– if using bargeif using bargeif using bargeif using barge----in, put key information at a phrase boundary,in, put key information at a phrase boundary,in, put key information at a phrase boundary,in, put key information at a phrase boundary,
– if not using bargeif not using bargeif not using bargeif not using barge----in, put key information before the tone.in, put key information before the tone.in, put key information before the tone.in, put key information before the tone.
© Macquarie University 2004 19
HelpHelpHelpHelp
• Empower the user with help availability.Empower the user with help availability.Empower the user with help availability.Empower the user with help availability.
• Let the user know how to get help.Let the user know how to get help.Let the user know how to get help.Let the user know how to get help.
• Once help is declared available, keep it available.Once help is declared available, keep it available.Once help is declared available, keep it available.Once help is declared available, keep it available.
• Return to a logical starting point after help.Return to a logical starting point after help.Return to a logical starting point after help.Return to a logical starting point after help.
• Use examples for help.Use examples for help.Use examples for help.Use examples for help.
© Macquarie University 2004 20
Developing Speech InterfacesDeveloping Speech InterfacesDeveloping Speech InterfacesDeveloping Speech Interfaces
• Speech interfaces can be developed usingSpeech interfaces can be developed usingSpeech interfaces can be developed usingSpeech interfaces can be developed using
– generalgeneralgeneralgeneral----purpose programming languagespurpose programming languagespurpose programming languagespurpose programming languages
– specialspecialspecialspecial----purpose (programming) languages.purpose (programming) languages.purpose (programming) languages.purpose (programming) languages.
• A specialA specialA specialA special----purpose language such as purpose language such as purpose language such as purpose language such as VoiceXMLVoiceXMLVoiceXMLVoiceXML cancancancan
– simplify application developmentsimplify application developmentsimplify application developmentsimplify application development
– reduce network trafficreduce network trafficreduce network trafficreduce network traffic
– separate interaction code from application logic codeseparate interaction code from application logic codeseparate interaction code from application logic codeseparate interaction code from application logic code
– provide portability and simplicityprovide portability and simplicityprovide portability and simplicityprovide portability and simplicity
– support prototyping and refinement.support prototyping and refinement.support prototyping and refinement.support prototyping and refinement.
© Macquarie University 2004 21
A A A A VoiceXMLVoiceXMLVoiceXMLVoiceXML ExampleExampleExampleExample
<?xml version = "1.0"?>
<vxml version = "2.0">
<form>
<block>
<prompt bargein = "false"> Welcome to Ajax Travel.
<audio src = "http://www.prerecorded.audiofile ..."/>
</prompt>
</block>
</form>
</vxml>
© Macquarie University 2004 22
VoiceXMLVoiceXMLVoiceXMLVoiceXML ArchitectureArchitectureArchitectureArchitecture
GatewayGatewayGatewayGateway
&&&&
Voice ServerVoice ServerVoice ServerVoice Server
WebWebWebWeb
ServerServerServerServer
InternetInternetInternetInternet
PSTNPSTNPSTNPSTN
InternetInternetInternetInternet
• VoiceXMLVoiceXMLVoiceXMLVoiceXML documentsdocumentsdocumentsdocuments
• audio filesaudio filesaudio filesaudio files
• service logic (CGI)service logic (CGI)service logic (CGI)service logic (CGI)
• transaction processingtransaction processingtransaction processingtransaction processing
• database interfacedatabase interfacedatabase interfacedatabase interface
• telephony interfacetelephony interfacetelephony interfacetelephony interface
• voice browservoice browservoice browservoice browser
• automated speech recognitionautomated speech recognitionautomated speech recognitionautomated speech recognition
• texttexttexttext----totototo----speech synthesisspeech synthesisspeech synthesisspeech synthesis
• touchtonetouchtonetouchtonetouchtone
• audio play/recordaudio play/recordaudio play/recordaudio play/record
PhonePhonePhonePhone
• regular phoneregular phoneregular phoneregular phone
• wireless phonewireless phonewireless phonewireless phone
• soft phonesoft phonesoft phonesoft phone
HTTP/HTTP/HTTP/HTTP/VoiceXMLVoiceXMLVoiceXMLVoiceXML
SIPSIPSIPSIP
© Macquarie University 2004 23
A A A A VoiceXMLVoiceXMLVoiceXMLVoiceXML ScenarioScenarioScenarioScenario
• A customer dials the phone number of a travel agent.A customer dials the phone number of a travel agent.A customer dials the phone number of a travel agent.A customer dials the phone number of a travel agent.
• The The The The VoiceXMLVoiceXMLVoiceXMLVoiceXML gateway receives the call along with information about gateway receives the call along with information about gateway receives the call along with information about gateway receives the call along with information about the dialed number.the dialed number.the dialed number.the dialed number.
• The The The The VoiceXMLVoiceXMLVoiceXMLVoiceXML gateway searches a database.gateway searches a database.gateway searches a database.gateway searches a database.
• If successful, it maps the dialed number to a URL.If successful, it maps the dialed number to a URL.If successful, it maps the dialed number to a URL.If successful, it maps the dialed number to a URL.
• This URL is the location of the agent’s main page (This URL is the location of the agent’s main page (This URL is the location of the agent’s main page (This URL is the location of the agent’s main page (ajax.vxml ).).).).
• The gateway retrieves the The gateway retrieves the The gateway retrieves the The gateway retrieves the ajax.vxml page together with associated page together with associated page together with associated page together with associated files such as grammars and recorded audio from the HTTP server.files such as grammars and recorded audio from the HTTP server.files such as grammars and recorded audio from the HTTP server.files such as grammars and recorded audio from the HTTP server.
• These associated files may be cached on the These associated files may be cached on the These associated files may be cached on the These associated files may be cached on the VoiceXMLVoiceXMLVoiceXMLVoiceXML gateway.gateway.gateway.gateway.
© Macquarie University 2004 24
A A A A VoiceXMLVoiceXMLVoiceXMLVoiceXML ScenarioScenarioScenarioScenario
• The The The The VoiceXMLVoiceXMLVoiceXMLVoiceXML interpreter parses and executes the interpreter parses and executes the interpreter parses and executes the interpreter parses and executes the VoiceXMLVoiceXMLVoiceXMLVoiceXMLdocument.document.document.document.
• The interpreter steps through The interpreter steps through The interpreter steps through The interpreter steps through ajax.vxml playing prompts, hearing playing prompts, hearing playing prompts, hearing playing prompts, hearing responses and passing them on to a speech recognition engine.responses and passing them on to a speech recognition engine.responses and passing them on to a speech recognition engine.responses and passing them on to a speech recognition engine.
• If necessary, additional If necessary, additional If necessary, additional If necessary, additional VoiceXMLVoiceXMLVoiceXMLVoiceXML documents and associated files documents and associated files documents and associated files documents and associated files are retrieved from the HTTP server.are retrieved from the HTTP server.are retrieved from the HTTP server.are retrieved from the HTTP server.
• Recorded audio is served by specifying the URL of the WAV file.Recorded audio is served by specifying the URL of the WAV file.Recorded audio is served by specifying the URL of the WAV file.Recorded audio is served by specifying the URL of the WAV file.
• Communications between the voice gateway and the HTTP server Communications between the voice gateway and the HTTP server Communications between the voice gateway and the HTTP server Communications between the voice gateway and the HTTP server follow standard HTTP protocols.follow standard HTTP protocols.follow standard HTTP protocols.follow standard HTTP protocols.
© Macquarie University 2004 25
VoiceXMLVoiceXMLVoiceXMLVoiceXML DocumentsDocumentsDocumentsDocuments
• A A A A VoiceXMLVoiceXMLVoiceXMLVoiceXML document forms a conversational finite state machine. document forms a conversational finite state machine. document forms a conversational finite state machine. document forms a conversational finite state machine.
• The caller is always in one conversational state, or dialog, at The caller is always in one conversational state, or dialog, at The caller is always in one conversational state, or dialog, at The caller is always in one conversational state, or dialog, at a time.a time.a time.a time.
• Each dialog determines the next dialog to transition to. Each dialog determines the next dialog to transition to. Each dialog determines the next dialog to transition to. Each dialog determines the next dialog to transition to.
• Transitions are specified using Transitions are specified using Transitions are specified using Transitions are specified using URIsURIsURIsURIs, which define the next , which define the next , which define the next , which define the next document and dialog to use. document and dialog to use. document and dialog to use. document and dialog to use.
• Execution is terminated Execution is terminated Execution is terminated Execution is terminated
– when a dialog does not specify a successor, or when a dialog does not specify a successor, or when a dialog does not specify a successor, or when a dialog does not specify a successor, or
– if it has an element that explicitly exits the conversation.if it has an element that explicitly exits the conversation.if it has an element that explicitly exits the conversation.if it has an element that explicitly exits the conversation.
© Macquarie University 2004 26
FormsFormsFormsForms
• Forms collect values for a set of field item variables. Forms collect values for a set of field item variables. Forms collect values for a set of field item variables. Forms collect values for a set of field item variables.
• Grammars define the allowable inputs for fields. Grammars define the allowable inputs for fields. Grammars define the allowable inputs for fields. Grammars define the allowable inputs for fields.
• Platform throws events if the input is outPlatform throws events if the input is outPlatform throws events if the input is outPlatform throws events if the input is out----ofofofof----grammar.grammar.grammar.grammar.
• Actions are performed when field items are filled.Actions are performed when field items are filled.Actions are performed when field items are filled.Actions are performed when field items are filled.
© Macquarie University 2004 27
FormsFormsFormsForms
<form id = Identifier>
<block> Message </block>
<field name = VariableName>
<prompt> Question </prompt>
<grammar src = URI type = MediaType/>
<catch event = EventType> HandlerMessage </catch>
<filled> Actions </filled>
</field>
</form>
© Macquarie University 2004 28
MenusMenusMenusMenus
• Menus present the caller with a set of options.Menus present the caller with a set of options.Menus present the caller with a set of options.Menus present the caller with a set of options.
• Transitions to another dialog are based on a choice.Transitions to another dialog are based on a choice.Transitions to another dialog are based on a choice.Transitions to another dialog are based on a choice.
• The <menu> element is a shortcut for a form with only one field.The <menu> element is a shortcut for a form with only one field.The <menu> element is a shortcut for a form with only one field.The <menu> element is a shortcut for a form with only one field.
• It is a convenient way to ask the user to pick one option from aIt is a convenient way to ask the user to pick one option from aIt is a convenient way to ask the user to pick one option from aIt is a convenient way to ask the user to pick one option from a list.list.list.list.
© Macquarie University 2004 29
MenusMenusMenusMenus
<menu id = Identifier>
<prompt> Question <enumerate/> </prompt>
<choice next = URI-1> Phrase-1 </choice>
<choice next = URI-2> Phrase-2 </choice>
<choice next = URI-3> Phrase-3 </choice>
<noinput> Message <enumerate/> </noinput>
</menu>
© Macquarie University 2004 30
TransitionsTransitionsTransitionsTransitions
• Transitions are specified using Transitions are specified using Transitions are specified using Transitions are specified using URIsURIsURIsURIs....
• URIsURIsURIsURIs define the next document and dialog to use. define the next document and dialog to use. define the next document and dialog to use. define the next document and dialog to use.
• If a URI does not refer to a document, the current document is aIf a URI does not refer to a document, the current document is aIf a URI does not refer to a document, the current document is aIf a URI does not refer to a document, the current document is assumed. ssumed. ssumed. ssumed.
• If it does not refer to a dialog, the first dialog in the documeIf it does not refer to a dialog, the first dialog in the documeIf it does not refer to a dialog, the first dialog in the documeIf it does not refer to a dialog, the first dialog in the document is assumed.nt is assumed.nt is assumed.nt is assumed.
• Transitions can be requested Transitions can be requested Transitions can be requested Transitions can be requested ---- for example for example for example for example ---- by:by:by:by:
<choice next = URI>
<goto next = URI>
<link next = URI>
© Macquarie University 2004 31
Prompt ElementPrompt ElementPrompt ElementPrompt Element
• The <prompt> element controls the output of synthesized speech The <prompt> element controls the output of synthesized speech The <prompt> element controls the output of synthesized speech The <prompt> element controls the output of synthesized speech and and and and prerecordedprerecordedprerecordedprerecorded audio. audio. audio. audio.
• Important attributes are:Important attributes are:Important attributes are:Important attributes are:
– bargeinbargeinbargeinbargein controls whether the caller can interrupt a promptcontrols whether the caller can interrupt a promptcontrols whether the caller can interrupt a promptcontrols whether the caller can interrupt a prompt
– condcondcondcond an expression telling if the prompt should be spokenan expression telling if the prompt should be spokenan expression telling if the prompt should be spokenan expression telling if the prompt should be spoken
– countcountcountcount a number that allows to emit different promptsa number that allows to emit different promptsa number that allows to emit different promptsa number that allows to emit different prompts
– timeouttimeouttimeouttimeout for the following caller inputfor the following caller inputfor the following caller inputfor the following caller input
© Macquarie University 2004 32
Tapered PromptsTapered PromptsTapered PromptsTapered Prompts
• Prompts can be used to vary a message given to the human.Prompts can be used to vary a message given to the human.Prompts can be used to vary a message given to the human.Prompts can be used to vary a message given to the human.
<field name = "card_type">
<prompt count = "1">
What kind of credit card do you have?
</prompt>
<prompt count = "2">
Type of card?
</prompt>
• Prompts may be tapered to be:Prompts may be tapered to be:Prompts may be tapered to be:Prompts may be tapered to be:
– more terse with use (field prompting)more terse with use (field prompting)more terse with use (field prompting)more terse with use (field prompting)
– more explicit (help prompts).more explicit (help prompts).more explicit (help prompts).more explicit (help prompts).
© Macquarie University 2004 33
Catch and Help ElementCatch and Help ElementCatch and Help ElementCatch and Help Element
• The <help> element is an abbreviation for The <help> element is an abbreviation for The <help> element is an abbreviation for The <help> element is an abbreviation for
<catch event = "help">
…
</catch>
• For exampleFor exampleFor exampleFor example
<help>
Please say Visa, Mastercard, or American Express.
</help>
• Additional attributes are: "count" and "Additional attributes are: "count" and "Additional attributes are: "count" and "Additional attributes are: "count" and "condcondcondcond".".".".
© Macquarie University 2004 34
ConditionsConditionsConditionsConditions
• The <if> element is used for conditional logic. The <if> element is used for conditional logic. The <if> element is used for conditional logic. The <if> element is used for conditional logic.
• It has optional <else> and <It has optional <else> and <It has optional <else> and <It has optional <else> and <elseifelseifelseifelseif> elements.> elements.> elements.> elements.
• The expression language used is JavaScript. The expression language used is JavaScript. The expression language used is JavaScript. The expression language used is JavaScript.
<if cond = "(card_type == 'amex' ||
card_type == 'american express') &&
card_num.length != 15">
…
<elseif …/>
…
</if>
• The "The "The "The "condcondcondcond" operator "&&" needs to be escaped "& &"." operator "&&" needs to be escaped "& &"." operator "&&" needs to be escaped "& &"." operator "&&" needs to be escaped "& &".
© Macquarie University 2004 35
VariablesVariablesVariablesVariables
• Variables are declared by <Variables are declared by <Variables are declared by <Variables are declared by <varvarvarvar> elements:> elements:> elements:> elements:
<var name = "mm"/>
<var name = "i" expr = "expiry_date.length"/>
• Variables are also declared by form items:Variables are also declared by form items:Variables are also declared by form items:Variables are also declared by form items:
<field name = "card_type"> … </field>
• Attributes are:Attributes are:Attributes are:Attributes are:
– namenamenamename the name of the variable that will hold the resultthe name of the variable that will hold the resultthe name of the variable that will hold the resultthe name of the variable that will hold the result
– exprexprexprexpr initial value is optionalinitial value is optionalinitial value is optionalinitial value is optional
© Macquarie University 2004 36
Assign ElementAssign ElementAssign ElementAssign Element
• The <assign> element assigns a value to a variable:The <assign> element assigns a value to a variable:The <assign> element assigns a value to a variable:The <assign> element assigns a value to a variable:
<assign name = "mm" expr = "expiry_date.substring(0, 1)"/>
• Variables need to be declared before making an assignment.Variables need to be declared before making an assignment.Variables need to be declared before making an assignment.Variables need to be declared before making an assignment.
• Attributes are:Attributes are:Attributes are:Attributes are:
– namenamenamename the name of the variable being assigned tothe name of the variable being assigned tothe name of the variable being assigned tothe name of the variable being assigned to
– exprexprexprexpr the new value of the variablethe new value of the variablethe new value of the variablethe new value of the variable
© Macquarie University 2004 37
Clear ElementClear ElementClear ElementClear Element
• The <clear> element resets one or more variables.The <clear> element resets one or more variables.The <clear> element resets one or more variables.The <clear> element resets one or more variables.
• For example:For example:For example:For example:
<clear namelist = "card_num"/>
• The attribute "The attribute "The attribute "The attribute "namelistnamelistnamelistnamelist" contains the variables to be reset." contains the variables to be reset." contains the variables to be reset." contains the variables to be reset.
© Macquarie University 2004 38
Throw ElementThrow ElementThrow ElementThrow Element
• The <throw> element throws an event. The <throw> element throws an event. The <throw> element throws an event. The <throw> element throws an event.
• This can be a preThis can be a preThis can be a preThis can be a pre----defined one:defined one:defined one:defined one:
<throw event = "nomatch"/>
or an applicationor an applicationor an applicationor an application----defined one:defined one:defined one:defined one:
<throw event = "com.att.portal.machine"/>
© Macquarie University 2004 39
Submit ElementSubmit ElementSubmit ElementSubmit Element
• The <submit> element is used to submit information to a server:The <submit> element is used to submit information to a server:The <submit> element is used to submit information to a server:The <submit> element is used to submit information to a server:
<submit next = "place_order.asp"
namelist = "card_type card_num expiry_date"/>
• It lets you submit a list of variables to the document server viIt lets you submit a list of variables to the document server viIt lets you submit a list of variables to the document server viIt lets you submit a list of variables to the document server via an a an a an a an HTTP GET or POST request:HTTP GET or POST request:HTTP GET or POST request:HTTP GET or POST request:
<submit next = "place_order.asp" method = "post"
namelist = "card_type card_num expiry_date"/>
© Macquarie University 2004 40
What is JavaScript?What is JavaScript?What is JavaScript?What is JavaScript?
• JavaScript is an objectJavaScript is an objectJavaScript is an objectJavaScript is an object----oriented scripting language.oriented scripting language.oriented scripting language.oriented scripting language.
• ClientClientClientClient----side JavaScript isside JavaScript isside JavaScript isside JavaScript is
– an implementation of an implementation of an implementation of an implementation of ECMAScriptECMAScriptECMAScriptECMAScript
– usually embedded directly in HTML pagesusually embedded directly in HTML pagesusually embedded directly in HTML pagesusually embedded directly in HTML pages
– interpreted.interpreted.interpreted.interpreted.
• SeverSeverSeverSever----side JavaScript isside JavaScript isside JavaScript isside JavaScript is
– used with Web servers used with Web servers used with Web servers used with Web servers
– proprietary and vendorproprietary and vendorproprietary and vendorproprietary and vendor----specificspecificspecificspecific
– interpreted or compiled.interpreted or compiled.interpreted or compiled.interpreted or compiled.
© Macquarie University 2004 41
What is JavaScript good for?What is JavaScript good for?What is JavaScript good for?What is JavaScript good for?
• Basically JavaScript is a scripting language for HTML designers.Basically JavaScript is a scripting language for HTML designers.Basically JavaScript is a scripting language for HTML designers.Basically JavaScript is a scripting language for HTML designers.
• JavaScript language has a very simple syntax.JavaScript language has a very simple syntax.JavaScript language has a very simple syntax.JavaScript language has a very simple syntax.
• JavaScript can JavaScript can JavaScript can JavaScript can
– put dynamic text into an HTML page,put dynamic text into an HTML page,put dynamic text into an HTML page,put dynamic text into an HTML page,
– react to events,react to events,react to events,react to events,
– read and write HTML elements,read and write HTML elements,read and write HTML elements,read and write HTML elements,
– be used to validate data. be used to validate data. be used to validate data. be used to validate data.
© Macquarie University 2004 42
Are JavaScript and Java identical?Are JavaScript and Java identical?Are JavaScript and Java identical?Are JavaScript and Java identical?
• No!No!No!No!
• JavaScript and Java are two completely different things!JavaScript and Java are two completely different things!JavaScript and Java are two completely different things!JavaScript and Java are two completely different things!
• JavaScript is a lightweight scripting language.JavaScript is a lightweight scripting language.JavaScript is a lightweight scripting language.JavaScript is a lightweight scripting language.
• Java is a powerful programming language.Java is a powerful programming language.Java is a powerful programming language.Java is a powerful programming language.
• Java belongs to the same category as C and C++. Java belongs to the same category as C and C++. Java belongs to the same category as C and C++. Java belongs to the same category as C and C++.
© Macquarie University 2004 43
Dynamic Scripting (CGI: Dynamic Scripting (CGI: Dynamic Scripting (CGI: Dynamic Scripting (CGI: script.pyscript.pyscript.pyscript.py))))
#!/usr/local/bin/python
import cgi
form = cgi.FieldStorage()
print "Content-type: text/xml\n\n"
if (form["answer"].value == 'true') :print "<vxml version=\"2.0\"><form> \
<block>You just said yes</block></form></vxml>"else:
print "<vxml version=\"2.0\"><form> \<block>You just said no</block></form></vxml>
© Macquarie University 2004 44
VoiceXMLVoiceXMLVoiceXMLVoiceXML ImplementationsImplementationsImplementationsImplementations
• WebWebWebWeb----based based based based VoiceXMLVoiceXMLVoiceXMLVoiceXML development tools:development tools:development tools:development tools:
– TellmeTellmeTellmeTellme at http://at http://at http://at http://studio.tellme.comstudio.tellme.comstudio.tellme.comstudio.tellme.com
– BeVocalBeVocalBeVocalBeVocal at http://at http://at http://at http://café.bevocal.comcafé.bevocal.comcafé.bevocal.comcafé.bevocal.com
– HeyAnitaHeyAnitaHeyAnitaHeyAnita at http://at http://at http://at http://www.heyanita.comwww.heyanita.comwww.heyanita.comwww.heyanita.com
• VoiceXMLVoiceXMLVoiceXMLVoiceXML platforms (and graphical development tools):platforms (and graphical development tools):platforms (and graphical development tools):platforms (and graphical development tools):
– Nuance at http://www.nuance.comNuance at http://www.nuance.comNuance at http://www.nuance.comNuance at http://www.nuance.com
– OptimTalkOptimTalkOptimTalkOptimTalk at http://at http://at http://at http://www.optimtalk.czwww.optimtalk.czwww.optimtalk.czwww.optimtalk.cz
– VocomoVoiceVocomoVoiceVocomoVoiceVocomoVoice Studio at http://Studio at http://Studio at http://Studio at http://www.vocomosoft.com/vvs.htmwww.vocomosoft.com/vvs.htmwww.vocomosoft.com/vvs.htmwww.vocomosoft.com/vvs.htm
© Macquarie University 2004 45
TellmeTellmeTellmeTellme StudioStudioStudioStudio
• TellmeTellmeTellmeTellme studio is a suite of Webstudio is a suite of Webstudio is a suite of Webstudio is a suite of Web----based based based based VoiceXMLVoiceXMLVoiceXMLVoiceXML development tools.development tools.development tools.development tools.
• TellmeTellmeTellmeTellme studio enables you studio enables you studio enables you studio enables you
– to build and test, and publish to build and test, and publish to build and test, and publish to build and test, and publish VoiceXMLVoiceXMLVoiceXMLVoiceXML applicationsapplicationsapplicationsapplications
– without buying or installing any hardware or software.without buying or installing any hardware or software.without buying or installing any hardware or software.without buying or installing any hardware or software.
• By registering, you can develop your application for free.By registering, you can develop your application for free.By registering, you can develop your application for free.By registering, you can develop your application for free.
• But check out first the But check out first the But check out first the But check out first the VoiceXMLVoiceXMLVoiceXMLVoiceXML elements supported by the elements supported by the elements supported by the elements supported by the TellmeTellmeTellmeTellmevoice interpreter.voice interpreter.voice interpreter.voice interpreter.
© Macquarie University 2004 46
MyStudioMyStudioMyStudioMyStudio
• VoiceXMLVoiceXMLVoiceXMLVoiceXML scratchpadscratchpadscratchpadscratchpad
– You can write a phone application using the You can write a phone application using the You can write a phone application using the You can write a phone application using the VoiceXMLVoiceXMLVoiceXMLVoiceXML scratchpad.scratchpad.scratchpad.scratchpad.
• Application URLApplication URLApplication URLApplication URL
– Alternatively, you can write a phone application using a text edAlternatively, you can write a phone application using a text edAlternatively, you can write a phone application using a text edAlternatively, you can write a phone application using a text editor itor itor itor and store the result on a Web server.and store the result on a Web server.and store the result on a Web server.and store the result on a Web server.
– The application URL points to the initial The application URL points to the initial The application URL points to the initial The application URL points to the initial VoiceXMLVoiceXMLVoiceXMLVoiceXML document.document.document.document.
• VoixeXMLVoixeXMLVoixeXMLVoixeXML terminalterminalterminalterminal
– You can test the application logic and flow using the You can test the application logic and flow using the You can test the application logic and flow using the You can test the application logic and flow using the VoiceXMLVoiceXMLVoiceXMLVoiceXMLterminal.terminal.terminal.terminal.
© Macquarie University 2004 47
MyStudioMyStudioMyStudioMyStudio
© Macquarie University 2004 48
Application URLApplication URLApplication URLApplication URL
© Macquarie University 2004 49
VoiceXMLVoiceXMLVoiceXMLVoiceXML TerminalTerminalTerminalTerminal
© Macquarie University 2004 50
Grammar ScratchpadGrammar ScratchpadGrammar ScratchpadGrammar Scratchpad
• The The The The TellmeTellmeTellmeTellme platform provides two choices when writing grammars:platform provides two choices when writing grammars:platform provides two choices when writing grammars:platform provides two choices when writing grammars:
– use a builtuse a builtuse a builtuse a built----in grammarin grammarin grammarin grammar
– define your own grammardefine your own grammardefine your own grammardefine your own grammar
• Supported grammar languages are:Supported grammar languages are:Supported grammar languages are:Supported grammar languages are:
– Nuance Grammar Specification Language (GSL) Nuance Grammar Specification Language (GSL) Nuance Grammar Specification Language (GSL) Nuance Grammar Specification Language (GSL)
– Speech Recognition Grammar Specification (SRGS)Speech Recognition Grammar Specification (SRGS)Speech Recognition Grammar Specification (SRGS)Speech Recognition Grammar Specification (SRGS)
• You can execute GSL + SRGS in the You can execute GSL + SRGS in the You can execute GSL + SRGS in the You can execute GSL + SRGS in the VoiceXMLVoiceXMLVoiceXMLVoiceXML scratchpad.scratchpad.scratchpad.scratchpad.
• But the But the But the But the TellmeTellmeTellmeTellme grammar tools support GSL grammars only.grammar tools support GSL grammars only.grammar tools support GSL grammars only.grammar tools support GSL grammars only.
© Macquarie University 2004 51
Grammar Scratchpad: GSLGrammar Scratchpad: GSLGrammar Scratchpad: GSLGrammar Scratchpad: GSL
© Macquarie University 2004 52
Grammar Phrase CheckerGrammar Phrase CheckerGrammar Phrase CheckerGrammar Phrase Checker
© Macquarie University 2004 53
Grammar Phrase Checker: Returned ValueGrammar Phrase Checker: Returned ValueGrammar Phrase Checker: Returned ValueGrammar Phrase Checker: Returned Value
© Macquarie University 2004 54
Grammar Phrase GeneratorGrammar Phrase GeneratorGrammar Phrase GeneratorGrammar Phrase Generator
© Macquarie University 2004 55
Grammar Phrase Generator: Generated PhraseGrammar Phrase Generator: Generated PhraseGrammar Phrase Generator: Generated PhraseGrammar Phrase Generator: Generated Phrase
© Macquarie University 2004 56
Connecting to Connecting to Connecting to Connecting to TellmeTellmeTellmeTellme StudioStudioStudioStudio
• To preview your application, you can use a phone and callTo preview your application, you can use a phone and callTo preview your application, you can use a phone and callTo preview your application, you can use a phone and call
– (408)(408)(408)(408)----678678678678----4465446544654465
or you can use a soft phone and call or you can use a soft phone and call or you can use a soft phone and call or you can use a soft phone and call
– sip:[email protected]:[email protected]:[email protected]:[email protected]
© Macquarie University 2004 57
GSL Grammar at WorkGSL Grammar at WorkGSL Grammar at WorkGSL Grammar at Work
<?xml version = "1.0"?>
<vxml version = "2.0">
<form>
<field name = "destination">
<prompt>Do you want to fly to New York or Washingto n?</prompt>
<grammar type = "application/x-gsl" mode = "voice">
<![CDATA[
[[(new york) (big apple)] {<destination "new york"> }
[washington (the capital)] {<destination "washington ">}]
]]>
</grammar>
<catch event = "nomatch noinput">
<reprompt/>
</catch>
© Macquarie University 2004 58
GSL Grammar at WorkGSL Grammar at WorkGSL Grammar at WorkGSL Grammar at Work
<filled>
<prompt>You said <value expr = "destination"/></prom pt>
</filled>
</field>
</form>
</vxml>
© Macquarie University 2004 59
Speech RecognitionSpeech RecognitionSpeech RecognitionSpeech Recognition
• Speech produces a sound pressure wave which forms an acoustic siSpeech produces a sound pressure wave which forms an acoustic siSpeech produces a sound pressure wave which forms an acoustic siSpeech produces a sound pressure wave which forms an acoustic signal. gnal. gnal. gnal.
• The microphone The microphone The microphone The microphone
– receives the acoustic signal and receives the acoustic signal and receives the acoustic signal and receives the acoustic signal and
– converts it to an analogue signal.converts it to an analogue signal.converts it to an analogue signal.converts it to an analogue signal.
• To store the analogue signal, it must be converted to a digital To store the analogue signal, it must be converted to a digital To store the analogue signal, it must be converted to a digital To store the analogue signal, it must be converted to a digital signal. signal. signal. signal.
• A speech recognizer tries to transform A speech recognizer tries to transform A speech recognizer tries to transform A speech recognizer tries to transform
a digitallya digitallya digitallya digitally----encoded acoustic signal in a natural languageencoded acoustic signal in a natural languageencoded acoustic signal in a natural languageencoded acoustic signal in a natural language
into text in that language.into text in that language.into text in that language.into text in that language.
© Macquarie University 2004 60
Problems of Describing SpeechProblems of Describing SpeechProblems of Describing SpeechProblems of Describing Speech
• Sentences consist of sequences of words, in text these are delimSentences consist of sequences of words, in text these are delimSentences consist of sequences of words, in text these are delimSentences consist of sequences of words, in text these are delimited ited ited ited by spaces.by spaces.by spaces.by spaces.
• When we produce speech, there are no markers for word boundariesWhen we produce speech, there are no markers for word boundariesWhen we produce speech, there are no markers for word boundariesWhen we produce speech, there are no markers for word boundaries....
• Speech can be described as a sequence of phonemes.Speech can be described as a sequence of phonemes.Speech can be described as a sequence of phonemes.Speech can be described as a sequence of phonemes.
• To identify words, we need to search for an interpretation withiTo identify words, we need to search for an interpretation withiTo identify words, we need to search for an interpretation withiTo identify words, we need to search for an interpretation within the n the n the n the phoneme sequence.phoneme sequence.phoneme sequence.phoneme sequence.
© Macquarie University 2004 61
Phases of ASRPhases of ASRPhases of ASRPhases of ASR
Phoneme IdentificationPhoneme IdentificationPhoneme IdentificationPhoneme Identification
Word IdentificationWord IdentificationWord IdentificationWord Identification
Feature ExtractionFeature ExtractionFeature ExtractionFeature Extraction
Words & PhrasesWords & PhrasesWords & PhrasesWords & Phrases
Audio InputAudio InputAudio InputAudio Input
Acoustic ModelAcoustic ModelAcoustic ModelAcoustic Model
Language ModelLanguage ModelLanguage ModelLanguage Model
© Macquarie University 2004 62
The Predictive Power of NThe Predictive Power of NThe Predictive Power of NThe Predictive Power of N----GramsGramsGramsGrams
• Example input:Example input:Example input:Example input:
– Yesterday I went to the …Yesterday I went to the …Yesterday I went to the …Yesterday I went to the …
• BigramsBigramsBigramsBigrams::::
– Next word is something that can or is likely to follow Next word is something that can or is likely to follow Next word is something that can or is likely to follow Next word is something that can or is likely to follow ‘the‘‘the‘‘the‘‘the‘
• Trigrams:Trigrams:Trigrams:Trigrams:
– Next word is something that can or is likely to follow ‘to the‘Next word is something that can or is likely to follow ‘to the‘Next word is something that can or is likely to follow ‘to the‘Next word is something that can or is likely to follow ‘to the‘
© Macquarie University 2004 63
Why use NWhy use NWhy use NWhy use N----Gram Language Modelling?Gram Language Modelling?Gram Language Modelling?Gram Language Modelling?
• Allow for large vocabulary applications.Allow for large vocabulary applications.Allow for large vocabulary applications.Allow for large vocabulary applications.
• A context free grammar of reasonable complexity can never foreseA context free grammar of reasonable complexity can never foreseA context free grammar of reasonable complexity can never foreseA context free grammar of reasonable complexity can never foresee e e e all the different utterance patterns that callers may use in spoall the different utterance patterns that callers may use in spoall the different utterance patterns that callers may use in spoall the different utterance patterns that callers may use in sponnnn----taneoustaneoustaneoustaneous speech input.speech input.speech input.speech input.
© Macquarie University 2004 64
Development Life CycleDevelopment Life CycleDevelopment Life CycleDevelopment Life Cycle
• The development life cycle of speech applications consists of thThe development life cycle of speech applications consists of thThe development life cycle of speech applications consists of thThe development life cycle of speech applications consists of the e e e following stages:following stages:following stages:following stages:
– investigation: identify applicationinvestigation: identify applicationinvestigation: identify applicationinvestigation: identify application
– design: specify business and conceptual model plus technologydesign: specify business and conceptual model plus technologydesign: specify business and conceptual model plus technologydesign: specify business and conceptual model plus technology
– development: develop applicationdevelopment: develop applicationdevelopment: develop applicationdevelopment: develop application
– testing: test applicationtesting: test applicationtesting: test applicationtesting: test application
– sustaining: deploy applicationsustaining: deploy applicationsustaining: deploy applicationsustaining: deploy application
© Macquarie University 2004 65
Dialog StylesDialog StylesDialog StylesDialog Styles
Mixed InitiativeMixed InitiativeMixed InitiativeMixed Initiative
Application DirectedApplication DirectedApplication DirectedApplication DirectedFormsFormsFormsForms
DTMF MenusDTMF MenusDTMF MenusDTMF Menus
User DirectedUser DirectedUser DirectedUser DirectedDictationDictationDictationDictation
QueryQueryQueryQuery
Command and ControlCommand and ControlCommand and ControlCommand and Control
© Macquarie University 2004 66
SALTSALTSALTSALT
• SALT (= Speech Application Language Tags)SALT (= Speech Application Language Tags)SALT (= Speech Application Language Tags)SALT (= Speech Application Language Tags)
– is an extension of HTMLis an extension of HTMLis an extension of HTMLis an extension of HTML
– consists of a small set of XML elements (tags)consists of a small set of XML elements (tags)consists of a small set of XML elements (tags)consists of a small set of XML elements (tags)
– adds a powerful speech interface to Web pages.adds a powerful speech interface to Web pages.adds a powerful speech interface to Web pages.adds a powerful speech interface to Web pages.
• SALT can be used for bothSALT can be used for bothSALT can be used for bothSALT can be used for both
– voicevoicevoicevoice----only browsersonly browsersonly browsersonly browsers
– multimodal browsers.multimodal browsers.multimodal browsers.multimodal browsers.
© Macquarie University 2004 67
Multimodal DialogsMultimodal DialogsMultimodal DialogsMultimodal Dialogs
• Today, spoken interfaces are possible that prompt users with Today, spoken interfaces are possible that prompt users with Today, spoken interfaces are possible that prompt users with Today, spoken interfaces are possible that prompt users with speech and understand simple words or phrases. speech and understand simple words or phrases. speech and understand simple words or phrases. speech and understand simple words or phrases.
• As the technology improves we can expect richer conversations. As the technology improves we can expect richer conversations. As the technology improves we can expect richer conversations. As the technology improves we can expect richer conversations.
• Speech can be combined with other modes of interaction. Speech can be combined with other modes of interaction. Speech can be combined with other modes of interaction. Speech can be combined with other modes of interaction.
• Multimodal interaction will enable users to Multimodal interaction will enable users to Multimodal interaction will enable users to Multimodal interaction will enable users to
– speakspeakspeakspeak
– write and typewrite and typewrite and typewrite and type
– hear and see hear and see hear and see hear and see
using a more natural interface than today's single mode browsersusing a more natural interface than today's single mode browsersusing a more natural interface than today's single mode browsersusing a more natural interface than today's single mode browsers....
© Macquarie University 2004 68
Multimodal BrowserMultimodal BrowserMultimodal BrowserMultimodal Browser
© Macquarie University 2004 69
VoiceXMLVoiceXMLVoiceXMLVoiceXML versus SALTversus SALTversus SALTversus SALT
• VoiceXMLVoiceXMLVoiceXMLVoiceXML and SALT are both and SALT are both and SALT are both and SALT are both
– markupmarkupmarkupmarkup languages languages languages languages
– that describe speech interfaces.that describe speech interfaces.that describe speech interfaces.that describe speech interfaces.
• VoiceXMLVoiceXMLVoiceXMLVoiceXML is designed for telephony applications:is designed for telephony applications:is designed for telephony applications:is designed for telephony applications:
– interactive voice response applications are the focus.interactive voice response applications are the focus.interactive voice response applications are the focus.interactive voice response applications are the focus.
• SALT targets speech application across a whole spectrum:SALT targets speech application across a whole spectrum:SALT targets speech application across a whole spectrum:SALT targets speech application across a whole spectrum:
– multimodal interactions are the focus.multimodal interactions are the focus.multimodal interactions are the focus.multimodal interactions are the focus.
© Macquarie University 2004 70
VXML SolutionVXML SolutionVXML SolutionVXML Solution
<?xml version = "1.0"?>
<vxml version = "2.0">
<form id = "TravelForm">
<field name = "OriginCity" >
<grammar src = "city.xml" />
<prompt> Where would you like to leave from? </prom pt>
<nomatch> I didn't understand </nomatch>
</field>
<field name = "DestCity" >
<grammar src = "city.xml"/>
<prompt> Where would you like to go to? </prompt>
<nomatch>I didn't understand </nomatch>
</field>
<filled> <submit ... /> </filled>
</form>
</vxml>
© Macquarie University 2004 71
SALT SolutionSALT SolutionSALT SolutionSALT Solution
<!– HTML -->
<html xmlns:salt = "urn:saltforum.org/schemas/020124 ">
<body onload = "RunAsk()">
<form id = "travelForm">
<input name = "txtBoxOriginCity" type = "text" />
<input name = "txtBoxDestCity" type = "text" />
</form>
© Macquarie University 2004 72
SALT SolutionSALT SolutionSALT SolutionSALT Solution
<!-- Speech Application Language Tags -->
<salt:prompt id = "askOriginCity"> Where would you like to leave from?
</salt:prompt>
<salt:prompt id = "askDestCity"> Where would you like to go to?
</salt:prompt>
<salt:prompt id = "sayDidntUnderstand" onComplete = " runAsk()">Sorry, I didn't understand.
</salt:prompt>
© Macquarie University 2004 73
SALT SolutionSALT SolutionSALT SolutionSALT Solution
<salt:reco id = "recoOriginCity" onReco = "procOrigin City()"
onNoReco = "sayDidntUnderstand.Start()">
<salt:grammar src = "city.xml" />
</salt:reco>
<salt:reco id = "recoDestCity" onReco = "procDestCity ()"
onNoReco = "sayDidntUnderstand.Start()">
<salt:grammar src = "city.xml" />
</salt:reco>
© Macquarie University 2004 74
SALT SolutionSALT SolutionSALT SolutionSALT Solution
<!-- script -->
<script>
function RunAsk() {
if (travelForm.txtBoxOriginCity.value == "") {
askOriginCity.Start();
recoOriginCity.Start();
} else if (travelForm.txtBoxDestCity.value == "") {
askDestCity.Start();
recoDestCity.Start();
}
}
© Macquarie University 2004 75
SALT SolutionSALT SolutionSALT SolutionSALT Solution
function procOriginCity() {
travelForm.txtBoxOriginCity.value = recoOriginCity.t ext;
RunAsk();
}
function procDestCity() {
travelForm.txtBoxDestCity.value = recoDestCity.text;
travelForm.submit(); }
</script>
</body>
</html>