Introduction to VoiceXML 2.0 Rob Marchand Director of Product Management VoiceGenie Technologies...

Post on 12-Jan-2016

213 views 1 download

transcript

Introduction to VoiceXML 2.0

Rob MarchandDirector of Product ManagementVoiceGenie Technologies Inc.

Introduction to VoiceXML

• Audienceo Managers and programmers with little

experience with VoiceXML

• Attendees will learn o The basic principles of VoiceXML, o Just enough syntax to design and code

simple speech applications requiring voice menus and voice forms.

VoiceXML in the Marketplace

• VoiceXML 2.0 is now ratified as a Recommendation (e.g., official standard) by the W3C

• Hundreds of millions of VoiceXML calls are answered every day

VoiceXML is the standard for building speech-enabled

applications

W3C and VoiceXML Forum

• W3C manages the technical evolution and development of the VoiceXML language

• VoiceXML Forum focuses on providing best practices, certification testing, resources and toolsTogether the W3C and VoiceXML Forum accelerate the adoption of

VoiceXML-based speech applications

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

Motivation for Speech Applications

• Users access Web sites from any telephone, anywhere, any time.

• Speaking and listening are the natural usage modes for phones.

Speech-enabled Applications Are Possible

Now• Increased computing power at less

expense o Due to improved chip design and

manufacturing techniques

• Improved speech recognitiono Due to refinements to basic speech

recognition algorithms

• Improved dialog design using voiceo Minimizes the number of words and

phrases that the speech recognizer must process at any point during the dialog

Strength of VoiceXML Applications

• Traditional system-directed dialogs for novice users

• Mixed initiative dialogs for experienced users

• Novice users smoothly become experienced users at their own pace

Limitations of VoiceXML Applications

• No special analysis of speech inputo Not suitable for training speech skills—

Reading, ESL, singing, etc.

• VUI conversational bandwidth is slower than GUI conversational bandwidtho Using a VUI is like drinking from Lake

Superior with a straw

Exercise 1

• Name or describe a speech application you could use at work.

• Name or describe a speech application you or family member can use at home.

XML

o XML = eXtensible Markup Languageo Elements are surrounded by tags

• <prompt>Welcome to the voice system </prompt>o Elements may be nested

<prompt>     Welcome to Ajax Travel <break/>

we have the cheapest fares </prompt>

o Elements may have attributes<choice next="#boat"><grammar type="application/grammar+xml"

version="1.0"        root = "by_boat" src = “boat.grxml”>  

o Because “<”, “>”, and “&” have special meanings• “&lt;” in place of “<”• “&gt;” in place of  “>”• “&amp;” in place of “&”.                     

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

DB

MultimediaFiles

AudioFiles

Web Server

HTMLScripts

VoiceXMLScripts

Grammars

Speech Server/Gateway

Web Browser

Capture VoiceASR

DTMFReplay Audio

TTS

Database Server

VoiceBrowser

Documents

W3C Speech Interface Framework

Speech Synthesis

GrammarOther

VoiceXML 2.0

Call Control

SemanticInterpretation

Status of W3C Speech Interface Languages

Call Control

SemanticInterpret-

ration

Recommendation

Proposed Recommendation

CandidateRecommendation

Last CallWorking Draft

Requirements

Working Draft

V 3

Synthesis

GrammarVoiceXML

2.0

VoiceXML2.1

PLS

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

VoiceXML 2.0 Fragment<?xml version="1.0"?><vxml version="2.0"> <form> … <field>   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking

</emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type"

mode = "voice">      <rule id = “account_type">          <one-of>               <item> savings </item>

               <item> checking </item>         <item> CD </item>                <item> certificate of deposit <tag>$ = “CD”<tag> </item>

          </one-of>     </rule>

</grammar> </field> …. <form> …</vxml>

  

Dialog Language (VoiceXML 2.0)

Speech Synthesis Markup Language (SSML)Speech Recognition Grammar Speci

VoiceXML 2.0 Fragment<?xml version="1.0"?><vxml version="2.0"> <form> … <field>   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode

= "voice">      <rule id = “account_type">          <one-of>               <item> savings </item>

               <item> checking </item>         <item> CD </item>                <item> certificate of deposit <tag>$ = “CD”<tag> </item>

          </one-of>     </rule>

</grammar> </field> …. </form> …</vxml>

  

Dialog Language (VoiceXML 2.0)Speech Synthesis Markup Language (SSML)

Speech Recognition Grammar Specification erpretation (SI)

VoiceXML 2.0 Fragment<?xml version="1.0"?><vxml version="2.0"> <form> … <field>   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode =

"voice">      <rule id = “account_type">          <one-of>               <item> savings </item>

               <item> checking </item>         <item> CD </item>                <item> certificate of deposit <tag>$ = “CD”<tag> </item>

          </one-of>     </rule>

</grammar> </field> …. </form> …</vxml>

  

Dialog Language (VoiceXML 2.0)Speech Synthesis Markup Language (SSML)Speech Recognition Grammar Specification (SRGS)

Semantic Interpretation (SI)

VoiceXML 2.0 Fragment<?xml version="1.0"?><vxml version="2.0"> <form> … <field>   <prompt> Which account <break/> <emphasis> savings </emphasis> or <emphasis> checking </emphasis> </prompt> <grammar type = "application/grammar+xml" root = “account_type" mode =

"voice">      <rule id = “account_type">          <one-of>               <item> savings </item>

               <item> checking </item>         <item> CD </item>                <item> certificate of deposit <tag>$ = “CD”<tag> </item>

          </one-of>     </rule>

</grammar> </field> …. </form> …</vxml>

  

Dialog Language (VoiceXML 2.0)Speech Synthesis Markup Language (SSML)Speech Recognition Grammar Specification (SRGS)Semantic Interpretation (SI)

VoiceXML 2.0 features

• Menus, forms, sub-dialogso <menu>, <form>,

<subdialog>

• Inputso Speech recognition

<grammar>o Recording <record>o Keypad <grammar

mode=“dtmf”>

• Outputo Audio files <audio>o Text-to-speech <prompt>

• Variableso <var> <script> <assign>

• Events– <nomatch>, <noinput>,

<help>, <catch>, <throw>• Transition and submission

– <goto>, <submit>– Telephony

– Connection control – <transfer>,

<disconnect>– Telephony information

– Platform– Objects

– Performance– Fetch

A Typical Voice Menu

<menu><prompt>

<audio src=“http://www.ajax.com/three_blind_mice.wav"/> Do you want to listen, next, prior, buy, or exit? </prompt>

<choice next="http://www.ajax.com/listen.vxml"> listen </choice><choice next="http://www.ajax.com/next.vxml"> next </choice><choice next="http://www.ajax.com/prior.vxml"> prior </choice><choice next="http://www.ajax.com/buy.vxml"> buy </choice><choice next="http://www.ajax.com/exit.vxml"> exit </choice>

</menu>Exercise 2:Write a menu that asks the user a “yes/no” question to confirm that the user wants to buy the audio “three blind mice

Answer to Exercise 2A “yes/no” menu

<menu><prompt>

Do you want to buy three blind mice now? </prompt>

<choice next="http://www.ajax.com/yes.vxml"> yes </choice><choice next="http://www.ajax.com/no.vxml"> no </choice>

</menu>

Typical Form Fill-In<form> <prompt>Welcome to the electronic payment

system.</prompt> <field name="card_number">

<prompt> Please enter your credit card number? </prompt> <grammar src=“http://www.ajax.com/credit_card_number.grxml"/>

</field> <field name="date">

<prompt>Please enter your expiration date </prompt> <grammar src=“http://www.ajax.com/credit_card_date.grxml"/>

</field></form>Exercise 3:Write a form that solicits the month, day, and year for the user’s birth date.

Answer to Exercise 3

<form> <prompt> When were you born? </prompt> <field name = "month">

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> <field name = "day">

<prompt> What day of the month? </prompt> <grammar src=“http://www.ajax.com/day.grxml"/>

</field> <field name = "year"> <prompt> What year </prompt>

<grammar src=“http://www.ajax.com/year.grxml"/> </field></form>

Event Handlers

• Deal with exceptional or error conditions

• Control mechanism for dialog turn retrieso <catch event=“noinput”> … </catch>o <catch event=“nomatch” … </catch>o <catch event=“help”> … </catch>

• Shorthand notation availableo <noinput> … </noinput>, etc.

• Scoped according to where they occuro <form>, <field>, etc.

Adding Event Handlers

<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> …..</form>

Adding Event Handlers

<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> …..</form>

Adding Event Handlers

<form> <prompt> When were you born? </prompt> <field name = "month"> <catch event=“noinput”> ….. </catch> <catch event=“nomatch> ….. </catch>

<prompt> What month?</prompt> <grammar src=“http://www.ajax.com/month.grxml"/>

</field> …..</form>

Default Event Handlers

<catch event = "help"> <prompt> Sorry, no help is available. </prompt></catch>

<catch event = "nomatch"> <prompt> I did not understand, please try again </prompt></catch>

<catch event = "noinput"> <prompt> I did not hear anything, please speak again </prompt></catch>

Exercise 4Write event handlers for the month field

<catch event = "help"> <prompt> ____________________ </prompt></catch>

<catch event = "nomatch"> <prompt> __________________________ </prompt></catch>

<catch event = "noinput"> <prompt> ___________________________________ </prompt></catch>

Answer to Exercise 4Write event handlers for the month field

<catch event = "nomatch"> <prompt> Which month, for example, January February, or March? </prompt></catch>

<catch event = "noinput"> <prompt> Say the name of the month you were born in </prompt></catch>

<catch event = "help"> <prompt> In what month were you born? </prompt></catch>

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

Speech Synthesis ML

StructureAnalysis

TextNormali-

zation

Text-to-Phoneme

Conversion

Prosody Analysis

WaveformProduction

Markup support:p, sNon-markup behavior:infer structure byautomated text analysis

Before and afterStructure Analysis

• Before structure analysiso Dr. Smith lives at 214 Elm Dr. He weights 214 lb. He

plays bass guitar. He also likes to fish; last week he caught a 19 lb. bass.

• After structure analysis

<s> He plays bass guitar. </s> <s> He also likes to fish; last week he

caught a 19 lb. bass. </s></p>

<p> <s> Dr. Smith lives at 214 Elm Dr. </s> <s> He weights 214 lb. </s>

Speech Synthesis ML

StructureAnalysis

TextNormali-

zation

Text-to-Phoneme

Conversion

Prosody Analysis

WaveformProduction

Markup support: say-as for dates, times, etc.sub for aliasingNon-markup behavior: automatically identify and convert constructs

Markup support:p, sNon-markup behavior:infer structure byautomated text analysis

After Text Normalization

<p> <s> <sub alias= "doctor">Dr. </sub> Smith lives at 214 Elm <sub alias = "drive">Dr. </sub> </s> <s> He weights 214<sub alias= "pounds"> lb. </sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a 19 <sub alias= "pound"> lb. </sub> bass. </s></p>

<p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address">214 </say-as> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number">214 </sayas> <sub alias = "pounds"> lb.</sub> </s> <s> He plays bass guitar. </s> <s> He also likes to fish; last week he caught a <say-as interpret-as = “number">19 </say-as> <sub alias= "pound"> lb. </sub> bass. </s></p>

Speech Synthesis ML

StructureAnalysis

TextNormali-

zation

Text-to-Phoneme

Conversion

Prosody Analysis

WaveformProduction

Markup support:phoneme, say-asNon-markup behavior:look up in pronunciation dictionary

Markup support: say-as for dates, times, etc.sub for aliasingNon-markup behavior: automatically identify and convert constructs

Markup support:p, sNon-markup behavior:infer structure byautomated text analysis

After text-to-phoneme conversion

<p> <s> <sub alias = "doctor">Dr.</sub> Smith lives at <say-as interpret-as = “address"> 214 </sayas> Elm <sub alias = "drive">Dr. </sub> </s> <s> He weighs <sayas interpret-as = “number”>214 </sayas> <sub alias= "pounds"> lb.</sub> </s> <s> He plays <phoneme alphabet = “IPA" ph="b@s">bass</phoneme> guitar. </s> <s> He also likes to fish; last week he caught a

<sayas interpret-as= “number">19 </sayas> <sub alias= "pound"> lb. </sub> <phoneme alphabet = “IPA" ph="bas">bass</phoneme>. </s></p>

Speech Synthesis ML

StructureAnalysis

TextNormali-

zation

Text-to-Phoneme

Conversion

Prosody Analysis

WaveformProduction

Markup support:emphasis, break, prosodyNon-markup behavior:automatically generate prosody through analysis of document structure andsentence syntax

Markup support:phoneme, say-asNon-markup behavior:look up in pronunciation dictionary

Markup support: say-as for dates, times, etc.sub for aliasingNon-markup behavior: automatically identify and convert constructs

Markup support:p, sNon-markup behavior:infer structure byautomated text analysis

Prosody Analysis(Initial text)

<prompt> Environmental control menu. Do you want

to adjust the lighting or temperature?

</prompt>

Prosody Analysis(Add pause at phrase boundaries)

<prompt> Environmental control menu

<break strength=“medium”/> Do you want to adjust the lighting or

temperature? </prompt>

Prosody analysis(De-emphasize familiar words)

<prompt> Environmental control menu

<break strength=“medium” /> <emphasis level = "reduced"> Do you want to adjust </emphasis> the lighting or temperature? </prompt>

Prosody Analysis(pause to let the listener catch up)

<prompt> Environmental control menu <break/> <emphasis level = "reduced " >

do you want to adjust </emphasis>

the lighting <break/> or temperature? </prompt>

Prosody Analysis(Add emphasis to focus listener’s attention)

<prompt> Environmental control menu <break/> <emphasis level = "reduced" > do you want to adjust the </emphasis> <emphasis level = "strong"> lighting </emphasis> <break/> or <emphasis level = "strong"> temperature?

</emphasis></prompt>

Speech Synthesis ML

StructureAnalysis

TextNormali-

zation

Text-to-Phoneme

Conversion

Prosody Analysis

WaveformProduction

Markup support:voice, audio*

Markup support:emphasis, break, prosodyNon-markup behavior:automatically generate prosody through analysis of document structure andsentence syntax

Markup support:phoneme, say-asNon-markup behavior:look up in pronunciation dictionary

Markup support: say-as for dates, times, etc.sub for aliasingNon-markup behavior: automatically identify and convert constructs

Markup support:paragraph, sentenceNon-markup behavior:infer structure byautomated text analysis

*audio icons, branding, advertising

Waveform Production

<prompt> <audio src=“http://www.example.com/adjust.wav" >

Environmental control menu. Do you want to adjust the lighting or temperature

</audio> </prompt>

Exercise 5(insert SSML commands)

<prompt> Welcome to Ajax Bank do you want to

withdraw or deposit funds? </prompt>

Answer to Exercise 5

<prompt> Welcome to Ajax Bank <break/> <emphasis level = "reduced " > do you want to

</emphasis> <emphasis level = "strong"> withdraw </emphasis> <break/> or <emphasis level = "strong">deposit </emphasis> funds? </prompt>

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

Grammars

• Describe what the user may say at a point in the dialog

• Enable the speech recognition engine to work faster and more accurately

• Consist of one or more “rules”

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

XML form of grammars

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of>

<item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

Rule describing single digits

Rule describing digitszero through ten

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

Grammar processor should start with the “zero_to_ten” rule

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of> <item> zero </item>               <ruleref uri = "#single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

This is a grammar used by the speech recognizer. (There may also be grammars for

DTMF recognizers.)

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of>               <item> zero </item>              <ruleref uri = "#single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

<one-of> describesalternatives

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of>               <item> zero </item>              <ruleref uri = #single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

Rule element references another rule

Example Grammar

<grammar type = "application/srgs+xml" root = "zero_to_ten" mode = "voice">

<rule id = "zero_to_ten">       <one-of>              <item> zero </item>              <ruleref uri = "#single_digit"/>               <item> ten </item>        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

Exercise 6:

Write a grammar for that recognizes

the digits zero to nineteen

Answer to Exercise 6Write a grammar for zero to nineteen

<grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice">

<rule id = "zero_to_19">       <one-of>              <item> zero </item>              <ruleref uri = "#single_digit"/>

        <item> ten </item>               <item> eleven </item>               <item> twelve </item>               <item> thirteen </item>               <item> fourteen </item>               <item> fifteen </item>               <item> sixteen </item>               <item> seventeen </item>               <item> eighteen </item>               <item> nineteen </item>

        </one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of>     </rule></grammar>

More Grammar Elements

• Repeat and optional<rule id = "goodness" scope = "public">      <item repeat = "0-3" > very </item>

good</rule>

• Sequence<rule id = "twenty_thru_twentynine“>

Twenty  <ruleref uri = "#single_digit"/>

</rule>

• Garbage<rule name = "James_Lewis">

    <item> James <ruleref special = “garbage"/> Lewis </item>

</rule>

Exercise 7

• Write a grammar for that recognizes the digits zero to thirty-nine

Answer to Exercise 7Write a grammar for zero to thirty-nine

<rule id = "zero_to_39">   <one-of>

<item> zero </item><item> <ruleref uri = "#single_digit"/> </item><item> <ruleref uri = "#teens"/> </item><item> twenty </item><item> twenty <ruleref uri = "#single_digit"/> </item><item> thirty </item><item> thirty <ruleref uri = "#single_digit"/> </item>

</one-of></rule>

     <rule id = "single_digit">          <one-of>               <item> one </item>               <item> two </item>               <item> three </item>               <item> four </item>               <item> five </item>               <item> six </item>               <item> seven </item>               <item> eight </item>              <item> nine </item>         </one-of> </rule>

<grammar type = "application/srgs+xml" root = "zero_to_39" mode = "voice">

<rule id = “teens">  <one-of>                  <item> ten </item>          <item> eleven </item>         <item> twelve </item>         <item> thirteen </item>         <item> fourteen </item>             <item> fifteen </item>             <item> sixteen </item>             <item> seventeen </item>             <item> eighteen </item>             <item> nineteen </item>     </one-of>  </rule>

Reusing existing grammars

  <grammar

type = "application/srgs+xml" root = "size” src = “http://www.example.com/size.grxml"/>

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

Semantic Interpretation

• To create smart voice user interfaces, we need to extract the semantic information from speech utterances

• Example:o Utterance: “I want to fly from Dublin to Paris”o Semantic Interpretation:

{origin: “Dublin”destination: “Paris”

}

Semantic Interpretation

ASR

Grammar withSemantic

InterpretationScripts

SemanticInterpretation

Processor

VoiceXMLInterpreter

Application

text

ECMAScriptobject

<submit>

fourteen

Semantic Interpretation

ASR

Grammar withSemantic

InterpretationScripts

VoiceXMLInterpreter

Application

text

fourteen

fourteen

ECMAScriptobject

SemanticInterpretation

Processor

<submit>

Semantic Interpretation

ASR

Grammar withSemantic

InterpretationScripts

VoiceXMLInterpreter

Application

text

fourteen

<item> fourteen <tag>$.quantity=“14”;</tag></item>

ECMAScriptobject

SemanticInterpretation

Processor

<submit>

Semantic Interpretation

ASR

Grammar withSemantic

InterpretationScripts

VoiceXMLInterpreter

Application

text

fourteen

{     quantity: “14” }

<item> fourteen <tag>$.quantity=“14”;</tag></item>

ECMAScriptobject

SemanticInterpretation

Processor

<submit>

Semantic Interpretation

ASR

Grammar withSemantic

InterpretationScripts

VoiceXMLInterpreter

Application

text

fourteen

quantity = “14”

fourteen

{     quantity: “14” }

<item> fourteen <tag>$.quantity=“14”;</tag></item>

ECMAScriptobject

SemanticInterpretation

Processor

<submit>

Semantic Interpretation

• Semantic Interpretation defines the content of <tag>s in SRGS grammars

• Two kinds of syntax for <tag> contents:o Semantic Literals (literal values)o Semantic Scripts (ECMAScript)

Semantic Interpretation

• Semantic Literals example:

<rule id=“drink“><one-of>

<item> coca cola <tag> coke </tag> </item>   <item> cola <tag> coke </tag>  </item>   <item> black fizzy stuff <tag> coke </tag> </item>

<item> coke </item> </one-of></rule>

Semantic Interpretation

• Semantic Literals example:

<rule id=“drink“><one-of>

<item> coca cola <tag> coke </tag> </item>   <item> cola <tag> coke </tag>  </item>   <item> black fizzy stuff <tag>coke </tag> </item>

<item> coke </item> Default Assignment </one-of></rule>

Semantic Interpretation

• Semantic Scripts employ ECMAScript

• Advantages:• Richer structure (objects)• Ability to perform computations

Semantic Interpretation

• Example grammar rule with Script Syntax:

<rule id = "action"> <one-of>

     <item> small <tag> $.size = "small"; </tag> </item>        <item> medium <tag> $.size = "medium"; </tag> </item>        <item> large <tag> $.size = “large"; </tag> </item>    </one-of> <one-of>     <item> green <tag> $.color = "green"; </tag> </item>        <item> blue   <tag> $.color = "blue"; </tag>  </item>        <item> white <tag> $.color = "white"; </tag>  </item>    </one-of></rule>

• ECMAScript structure:

action: {     size: "large" color:  "white"     }

Large white

Semantic Interpretation

• Example grammar rule with Script Syntax:

<rule id="calculator"> What is <ruleref uri="#digit"/><tag>$.total = $digit;</tag> <item repeat="1-"> plus <ruleref uri="#digit"/> <tag> $.total = $.total + $digit; </tag> </item> </rule>

• ECMAScript structure:

calculator: {     total: 6     }

What is 1+ 2+ 3?

Exercise 8Fill in the contents of <tag>

• Grammar rule:

<rule id = “transfer"> from

         <one-of>          <item> savings <tag>________________________ </tag> </item>          <item> checking <tag>________________________</tag>  </item>      </one-of>

to <one-of>          <item> savings <tag>________________________</tag> </item>

          <item> checking <tag>________________________</tag> </item>      </one-of></rule>

• ECMAScript structure:

transfer: {     source_account: "savings" target_account:  “checking"     }

From savings to checking

Answer to Exercise 8

From savings to checking

• Grammar rule:

<rule id = “transfer"> from

         <one-of>          <item> savings <tag> $.source_account = “savings"; </tag> </item>          <item> checking <tag> $.source_account = “checking"; </tag> </item>      </one-of>

to <one-of>          <item> savings  <tag> $.target_account = “savings"; </tag> </item>

          <item> checking <tag> $.target_account = “checking"; </tag> </item>      </one-of></rule>

• ECMAScript structure:

transfer: {      source_account: "savings" target_account:  “checking"     }

Outline

• Motivation for VoiceXML• W3C Speech Interface Framework

Languages• Dialog—VoiceXML 2.0 • Speech Synthesis—SSML • Grammars—SRGS • Semantic Interpretation—SI • Call Control

CCXML

• Provides call control support for VoiceXML and other dialog languages

• Separate interpreter from VoiceXMLo Lives on its own threado Handles asynchronous events

• May be used to create standalone applications

• Replaces <transfer> and <disconnect> currently in VoiceXML 2.0 (or provides the underlying support for them)

CCXML

• VoiceXML

CCXML

• VoiceXML + CCXML

CCXML

• Featureso Multi-party conferencing (human and

machine)o Sophisticated multi-call handling and controlo Support for async external messages and

eventso More sophisticated call control than

VoiceXMLo Call control protocol independence

• Goal to support very high density and performance

CCXML<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state">

</eventprocessor>

CCXML<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> </transition>

</eventprocessor>

CCXML<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition>

</eventprocessor>

CCXML<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition> <transition state="joining" event="conference.joined"> </transition> </eventprocessor>

CCXML<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition> <transition state="joining" event="conference.joined"> <dialogstart conferenceid="conf_id" src=“‘newcaller.vxml’"/> <assign name="state" expr="‘active’"/> </transition> </eventprocessor>

CCXML<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition> <transition state="joining" event="conference.joined"> <dialogstart conferenceid="conf_id" src=“‘newcaller.vxml’"/> <assign name="state" expr="‘active’"/> </transition> </eventprocessor>

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.0">

<form>

<block>

A new participant has entered the conference.

</block>

</form>

</vxml>

<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition> <transition state="joining" event="conference.joined"> <dialogstart conferenceid="conf_id" src=“‘newcaller.vxml’"/> <assign name="state" expr="‘active’"/> </transition> </eventprocessor>

Exercise 9Announce when a caller leaves

<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition> <transition state="joining" event="conference.joined"> <dialogstart conferenceid="conf_id" src=“‘newcaller.vxml’"/> <assign name="state" expr="‘active’"/> </transition> <transition state=“active" event="connection.disconnected"> <dialogstart conferenceid="conf_id" src=“‘callerleft.vxml’"/> <assign name="state" expr="‘inactive’"/> </transition></eventprocessor>

Answer to Exercise 9

<var name="state" expr="‘initial’"/>

<eventprocessor statevariable="state"> <transition state="initial" event=“connection.connected"> <join id1="conf_id" id2="conn_id"/> <assign name="state" expr="‘joining’"/> </transition> <transition state="joining" event="conference.joined"> <dialogstart conferenceid="conf_id" src=“‘newcaller.vxml’"/> <assign name="state" expr="‘active’"/> </transition> <transition state=“active" event="connection.disconnected"> <dialogstart conferenceid="conf_id" src=“‘callerleft.vxml’"/> <assign name="state" expr="‘inactive’"/> </transition></eventprocessor>

<vxml xmlns="http://www.w3.org/2001/vxml" version="2.0">

<form>

<block>

A participant has left the conference.

</block>

</form>

</vxml>

Answer to Exercise 9

Example Applications withCCXML-VoiceXML

• Alertso Stock value changes, order is available, flight is

delayed, road closure, school closure• Conference

o Add additional person to the conferenceo Whispero Eject

• Find meo Try alternative telephone numbers

• Instant messagingo Notify me when John calls in to access his e-mail

• Control home applicationso Turn on/off coffee pot, oven, air conditioner, lights,

arm/disarm the security system• Call Center/Customer Care Applications

VoiceXML 2.1

• VoiceXML’s success and popularity resulted in many implementations early in the standardization process

• Additional, innovative features were conceived after VoiceXML 2.0 content was agreed

• Goals of VoiceXML 2.1:o Ensure portability by specifying a set of

commonly implemented extensionso Backwards-compatible with VoiceXML 2.0o Follow a “fast track” to standardization

VoiceXML 2.1

• Standardized extensions:o Locate barge-in occurrences within

promptso Interact directly with XML-based

infrastructureo Access recognition utterances for analysiso Increase performance be reducing server

round-tripso Extended call transfer types

Summary

• W3C Speech Interface Frameworko Dialog—VoiceXML o Grammar—SRGSo Synthesis—SSML o Semantic Interpretation—SI o Call Control—CCXML

• Can work together or separately• See http://www.w3.org/voice/ for

details

Resources

Industry Organizations

• World Wide Web Consortiumo http://www.w3.org

• W3C Voice Browser Working Groupo http://www.w3.org/voice/

• W3C Multi-Modal Working Groupo http://www.w3.org/2002/mmi/

• VoiceXML Forumo http://www.voicexml.org

• SALT Forum: o http://www.saltforum.org

• Speech Technology Magazineo http://www.amcommexpos.com/

Books

• James A. Larson, VoiceXML—An Introduction to Developing Speech Applications, 2002, Upper

Saddle River, NJ: Prentice Hall.• Eve Astrid Andersson, et.al., Early Adopter Voice, 2001, Birmingham UK:

Vrox.• Bruce Balentine & David P. Morgan, How to Build a Speech Recognition

Application: A Style Guide for Telephony Dialogues, 1999, San Ramon, CA: Enterprise Integration Group.

• Rick Beasley et. al., Voice Application Development with Voice, 2002, Indianapolis: Sams.

• Bob Edgar, The Voice Handbook, 2001, New York: CMP.• Susan Weinschenk & Dean T. Barker, Designing Effective Speech

Interfaces, 2000, New York: John Wiley & Sons.• Chetan Sharma & Jeff Kunins, Voice: Strategies and Techniques for

Effective Voice Application Development with Voice 2.0, 2002, New York: John Wiley.

• Michael H. Cohen, James P. Giangola, & Jennifer Balogh, Voice User Interface Design, 2004, Addison Wesley.

Tutorials and Articles

• VoiceXML Forum o http://www.voicexmlforum.org/

• VoiceXML Reviewo http://www.voicexmlreview.org/

• World of VoiceXMLo http://www.kenrehor.com/voicexml/

Online Voice SDKs

Name URL

BeVocal Cafe http://cafe.bevocal.com

Hey Anita FreeSpeech http://www.heyanita.com

Tellme Studio http://studio.tellme.com

VoiceGenie Developer Workshop

http://developer.voicegenie.com

Voxeo Community http://www.voxeo.com

Voxpilot voxbuilder http://www.voxbuilder.com

 

Downloadable Voice Interpreters

Name URL

IBM WebSphere Voice Server SDK

http://www.ibm.com/software/voice

Public VoiceXML Interpreters

Interpreter Source URL

OpenVXI - VoiceXML Interpreter

Carnegie-Melon University Department of Computer Science Speech Group

http://www.speech.cs.cmu.edu/openvxi/index.html

PublicVoiceXML - VoiceXML platform

Public Voice Lab Vienna, Austria

http://www.publicvoicexml.org/

 

Introduction to VoiceXML

• Questions?