Voice Browsing And Multimodal Interaction In 2009

transcript

Google TechTalk – Mar 6 th, 2009 1Paolo Baggia 1

March 6th, 2009

Voice Browser and Multimodal Interaction In 2009

Paolo BaggiaDirector of International Standards

Google TechTalk

Google TechTalk – Mar 6 th, 2009 2Paolo Baggia

Overview

A Bit of History

W3C Speech Interaction Framework TodayASR/DMTFTTSLexiconsVoice Dialog and Call ControlVoice Platforms and Next Evolutions

W3C Multimodal Interaction TodayMMI ArchitectureEMMA and InkMLA language for Emotions

Next Future

Company Profile

� Privately held company (fully owned by Telecom Italia), founded in 2001 as spin-off from Telecom Italia Labs, capitalizing on 30yrs experience and expertise in voice processing.

� Global Company, leader in Europe and South America for award-winning, high quality voice technologies (synthesis, recognition, authentication and identification) available in 26 languages and 62 voices.

� Multilingual, proprietary technologies protected over 100 patents worldwide

� Financially robust, break-even reached in 2004, revenues and earnings growing year on year

� Growth-plan investment approved for the evolution of products and services.

� Offices in New York. Headquarters in Torino, local representative sales offices in Rome, Madrid, Paris, London, Munich

� Flexible: About 100 employees, plus a vibrant ecosystem of local freelancers.

Torino

Madrid

London

New York

Munich

International Awards

“Best Innovation in Automotive Speech Synthesis” Pri ze AVIOS-SpeechTEK West 2007

“Best Innovation in Expressive Speech Synthesis” Pri ze AVIOS-SpeechTEK West 2006

“Best Innovation in Multi-Lingual Speech Synthesis”Prize AVIOS-SpeechTEK West 2005

“2008 Frost & Sullivan European Telematics and Infot ainmentEmerging Company of the Year” Award

Winner of “Market leader-Best Speech Engine” Speech Industry Award 2007 and 2008

Loquendo MRCP Server: Winner of 2008 IP Contact Center Technology Pioneer Award

A Bit of History

Standard Bodies

Two main standard bodies:W3C – World Wide Web Consortium

Founded in 1994, by Tim Berners-Lee with a mission to lead the Web to its full potential. Staff based in MIT (USA), ERCIM (France), Keio Univ (Japan).400 members all over the world, 50 Working, Interest and Coordination Groups.W3C is where the framework of today’s Web is developed (HTML, CSS, XML, DOM, SOAP, RDF, OWL, VoiceXML, SVG, XSLT, P3P, XML, Internationalization, Web Accessibility, Device Independence)

IETF – Internet Engineering Task ForceFounded in 1986, but growth in 1991as Internet Society. 1300 members.HTTP, SIP, RTP and many others protocols. Media Resource Control Protocol (MRCP) is very relevant for speech platforms.

Two industrial forums:VoiceXML Forum (www.voicexml.org)

Inventors of VoiceXML 1.0, then submitted to W3C for standardization.Current goal is to promote, disseminate and support VoiceXML and related standards.

SALT Forum (www.saltforum.org)Supported by Microsoft to define a lightweight markup for telephony and multimodal applications.

Other relevant bodies:3GPP, OMA, ETSI, NIST

SSML 1.0 W3C RecSRGS 1.0

W3C Rec

W3C Voice Browser

WorkshopVoiceXML 1.0

Released

VoiceXML Forum Birth

W3C charters Voice Browser

W3C charters Multimodal Interaction

SALT Forum Birth

VoiceXML 2.0 W3C Rec

By AT&T, IBM,Lucent, Motorola,

By Cisco, Comverse, Intel, Microsoft, Philips,SpeechWorks,

Preparing to announce VoiceXML 1.0Friday Feb. 25 th, 2000Lucent, Naperville, Illinois

Left to right: Gerald Karam (AT&T), Linda Boyer (IBM), Ken Rehor (Lucent), Bruce Lucas (IBM),Pete Danielsen (Lucent), Jim Ferrans (Motorola), Dave Ladd (Motorola).

The (r)evolution of VoiceXML1998 - 2004

SISR 1.0 W3C Rec

VoiceXML 2.0 W3C Rec

EMMA 1.0 W3C Rec

PLS 1.0W3C REC

Speech Interface Framework in 2000 (by Jim Larson)

DialogManager

WorldWideWeb

TelephoneSystem

ContextInterpretation

MediaPlanning

LanguageGeneration

ASRLanguage

Understanding

DTMF Tone Recognizer

Pre-recorded Audio Player

Speech SynthesisMarkup Language (SSML)

Pronunciation LexiconSpecification (PLS)

Reusable Components Call Control XML(CCXML)

Semantic Interpretation forSpeech Recognition (SISR)

N-gram Grammar ML

Speech RecognitionGrammar Spec. (SRGS)

Natural LanguageSemantics ML

VoiceXML 2.0

VoiceXML 2.1 EMMA

DialogManager

WorldWideWeb

TelephoneSystem

ContextContextContextContextInterpretationInterpretationInterpretationInterpretation

MediaPlanning

LanguageGeneration

N-gram Grammar ML

VoiceXML 2.0

VoiceXML 2.1 EMMA 1.0

LanguageLanguageLanguageLanguageUnderstandingUnderstandingUnderstandingUnderstanding

Speech Interface Framework - Today(by Jim Larson)

DialogManager

WorldWideWeb

TelephoneSystem

MediaPlanning

LanguageGeneration

N-gram Grammar ML

VoiceXML 2.0

Speech Interface Framework - End of 2009 (by Jim Larson)

W3C Process

Architectural Changes

User Speech Applic.

ASR / DTMF

TTS / Audio

Traditional (proprietary) architecture

ProprietarySCE

Proprietaryplatform

User VoiceXML Browser

ASR / DTMF

TTS / Audio

Web Applic.HTTP

VoiceXML architecture

.grxml/.gram, .pls

.ssml, .wav/.mp3, .pls

VoiceXMLplatform

The VoiceXML Impact

VoiceXML changed the landscape of IVRs and speech application creationFrom proprietary to standard-based speech applications

• Proprietary platforms(HW & SW)

• Proprietary applications (by proprietary SCE)

• Mainly DTMF and pre-recorded prompts

• First attempts to add speech into IVR

• Standard VoiceXMLplatforms

• Standards for SpeechTechnologies

• Standard tools forVoiceXML applications

• Integration of DTMFand ASR

• Still predominance ofDTMF, but more andmore speechapplications

Before After

Overview

� A Bit of History

W3C Speech Interaction Framework TodayASR/DMTFTTSLexiconsVoice Dialog and Call ControlVoice Platforms and Next Evolutions

� W3C Multimodal Interaction Today� MMI Architecture� EMMA and InkML� A language for Emotions

� Next Future

Standards for ASR and DTMFSRGS 1.0, SISR 1.0

SYNTAXDefines constraints on

admissible sentences fora specific recognition turn

W3C Standards for Speech/DTMF Grammars

Speech

grammar

SRGSSRGS

voicevoice dtmfdtmf

ABNFABNF XMLXML

SEMANTICSDescribes how to

produce results after an utterance is recognized

SISRSISR

literalliteral scriptscript

http://www.w3.org/TR/semantic-interpretation/http://www.w3.org/TR/speech-grammar/

SRGS/SISR Grammars for “Torino”

#ABNF 1.0 iso-8859-1;

mode voice;

tag-format < semantics/1.0 >;

{var unused=7;};

public $main = Torino {out="10100";} ;

<?xml version="1.0" encoding="UTF-8"?><grammar xml:lang="en-US" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format=" semantics/1.0 ">

<tag>var unused=7;</tag><rule id="main" scope="public">

<token>Torino</token><tag>out="10100";</tag>

</rule>

</grammar>

SISRscript

#ABNF 1.0 iso-8859-1;

mode voice;

tag-format < semantics/1.0-literals >;

public $main = Torino {10100} ;

<?xml version="1.0" encoding="UTF-8"?><grammar xml:lang="en-US" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format=" semantics/1.0-literals ">

<rule id="main" scope="public"><token>Torino</token><tag>10100</tag>

</rule>

</grammar>

SISRliteral

SRGS ABNFSRGS XML

SRGS/SISR Standards – Pros

Powerful syntax (CFG) and very powerful semantics (ECMA)DMTF and Voice input are transparent to the applicationWide and consistent adoption among technology vendors

Two syntax XML and ABNF are great!� Developers can choose (XML validation vs. compact format)

� Transformations are possibleXML � ABNF (easy, simple XSLT)ABNF � XML (requires a ABNF parser)

� Open Source tools might be created to:� Validate grammar syntax

� Transform grammars

� Debug grammars on written input� Coverage tests: explode covered sentences, GenSem, SemTester, etc.

SRGS/SISR Standards – Small Issues

Semantics declaration: tag-format attribute� If value “semantics/1.0”?

� Mandate SISR Script semantics inside semantic tags� If value “semantics/1.0-literal”?

� Mandate SISR Literal semantics inside semantic tags� If missing?

� Unclear! Risk of interoperability troubles

SISR Script Semantics� Clumsy default assignment: returns last referenced rule only

� Developer must properly pop-up results� Be careful to redefine “out”

� Assign a scalar value might result in errors

SISR Literal Semantics� Only useful for very simple word-list rules� No support for encapsulating rules

� SISR Literal grammars as external references ONLY!

SRGS/SISR – Encapsulated Grammars

Gr1.grxmlScript

Gr2.gramLiteral

Gr3.grxmlScript

Gr41.grxmlLiteral

Gr42.gramScript

SRGS/SISR Standards – Rich XML Results

Section 7 of SISR 1.0 specificationhttp://www.w3.org/TR/semantic-interpretation/#SI7

Serialization rules from SISR ECMA results into XMLEdge cases:

ArraysSpecial variable “_attribute ” and “_value ”

Creation of namespaces and prefixes{

drink: {_nsdecl: {

_prefix:"n1",_name:"http://www.example.com/n1"

},_nsprefix:"n1",liquid: {

_nsdecl: {_prefix:"n2",_name:"http://www.example.com/n2"

},_attributes: {

color: {_nsprefix:"n2",_value:"black"

}},_value:"coke"

},size:"medium"

<n1:drink xmlns:n1="http://www.example.com/n1"><liquid n2:color="black“

xmlns:n2="http://www.example.com/n2">coke</liquid><size>medium</size>

</n1:drink>

SRGS/SISR Standards – Next Steps

Adoption of the PLS 1.0 lexicon� Clear entry point into PLS lexicons, <token> element

� Missing role attribute in <token> to allow homographs disambiguation

Next extensions via Errata� XML 1.1 support and IR

� Update normative references

�� No Major Extensions are needed!

Speech SynthesisSSML 1.0/1.1

TTS – Functional Architecture and Markup/Non-Markup support

StructureAnalysis

TextNormalization

Text-to-Phoneme

Conversion

ProsodyAnalysis

WaveformProduction

Markup support:<p>, <s>Non-Markup support:infer the structure by automatic text analysis

Markup support:<say-as> for date, time, phone number, numbers<sub> for acronyms and transliterationsNon-Markup support:automatically identify and convert constructs

Markup support:<phoneme> , <lexicon>Non-Markup support:look up in pronunciation dictionary

Markup support:<emphasis> , <break> , <prosody>Non-Markup support:automatically generate prosody through analysis of document structure and sentence syntax

Markup support:<voice> , <audio>Non-Markup support:

http://www.w3.org/TR/speech-synthesis/

SSML 1.0 – Language description (I)

Document Structure<speak> root element

<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0" xmlns="http://www.w3.org/2001/ 10/synthesis" xml:lang="en-US"><p>I don't speak Japanese.</p><p xml:lang="ja">Nihongo-ga wakarimasen.</p></speak>

version attributeSSML namespace attribute

Languages

Processing and Pronunciation – <p> and <s> (paragraph and sentence)

to give a structure to the text– <say-as> element

to indicate the type of text construct contained within the elementex. date, numbers, etc.

– <phoneme> elementto provides a phonetic pronunciation for the contained text in IPA

– <sub> elementto provide substitutions for expanding acronyms in sequence of words

SSML 1.0 – Language description (II)

Style- <voice> element

<?xml version="1.0" encoding="ISO-8859-1"?><speak version="1.0"

xmlns="http://www.w3.org/2001/10/synthesis" xml:lan g="en-US">

The moon is raising on the beach, when John says, looking Mary in the eyes:<voice name="simon">I love you!</voice>

but she suddenly replies:<voice name="susan"> Please, be serious! </voice>

</speak>

Other voice selection attributes are:name, xml:lang , gender , age , and variant

- <emphasis> elementrequests that the contained text be spoken with emphasis

level attribute can set it to strong , moderate , reduced , or none

- <break> elementcontrols the pausing between words

time attribute with two kind of values:Time expressions “5s”, “20ms”

strength attribute with values:none , x-weak , weak, medium (default value), strong , or x-strong

SSML 1.0 – Language description (III)

Prosody<prosody> element

permits control of the pitch, speaking rate and volume of the speech output.

The attributes are:volume : the volume for the contained text.rate : the speaking rate in words-per-minute for the contained text.duration : a value in seconds or milliseconds for the desired time to take

to read the element contents.pitch : the baseline pitch for the contained text.range : the pitch range (variability) for the contained text in Hertz.contour : sets the actual pitch contour for the contained text.

Other elements<audio> element - to play an audio file<mark> element - to place a marker into the text/tag sequence<desc> element - to provide a description of a non-speech audio

source in <audio>http://www.w3.org/TR/speech-synthesis/

Towards SSML 1.1 – Motivations

Internationalization needs:� Three Workshops: Beijing (Nov’05), Crete (May’06), Hyderabad (Jan’07)

� Results:� No major needs for Eastern and Western European languages

� Many issues for Far East languages (Mandarin, Japanese, Korean)

� Some specific issues for Semitic languages (Arabic, Hebrew), Farsi and many Indian languages

� Mark input with or without vowels� Mark the transliteration schema used for input

Extensions required by Voice Browser:� More powerful error handling, selection of fall-back strategies� Trimming attributes

� Volume attribute to adopt a logarithmic scale (before was linear)

Alignment with PLS 1.0 specification for user lexicons

http://www.w3.org/TR/speech-synthesis11/

SSML 1.1 – Language Changes

<w> element

Lexicon extensions<lookup> element

permits control of the pitch, speaking rate and volume of the speech output.

Phonetic Alphabet Registry creation and adoption� "ipa " for International Phonetic Alphabet

� Registering policy for other phonetic alphabets, similar to LTRU for Language tags

� Candidates:� PinYin for Mandarin Chinese� JEITA for Japanese

� X-SAMPA, ASCII transliteration of IPA codes

Pronunciation LexiconPLS 1.0

Pronunciation Lexicons

Pronunciation LexiconA mapping between words (or short phrases), their written representations,

and their pronunciations suitable for use by an ASR engine or a TTS engine

Pronunciation lexicons are not only useful for voice browsers They have also proven effective mechanisms to support accessibility for the

differently able as well as greater usability for all users

They are used to good effect in screen readers and user agents supporting multimodal interfaces

The W3C Pronunciation Lexicon Specification (PLS) Version 1.0 isdesigned to enable interoperable specification of pronunciation lexicons

http://www.w3.org/TR/pronunciation-lexicon/

PLS 1.0 – Language Overview

A PLS document is a container (<lexicon> ) of several lexical entries (<lexeme> )

Each lexical entry containsOne or more spellings (<grapheme> )

One or more pronunciations (<phoneme>) or substitutions (<alias> )

Each PLS document is related to a single unique language (xml:lang )

SSML 1.0 and SRGS 1.0 documents can reference one or more PLS documents

Current version doesn’t include morphological, syntactic and semantic information associated with pronunciations

PLS 1.0 – An Example

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0"

xmlns=" http://www.w3.org/2005/01/pronunciation-lexicon "xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance " xsi:schemaLocation=" http://www.w3.org/2005/01/pronunciation-lexicon

http://www.w3.org/TR/pronunciation-lexicon/pls.xsd "alphabet =" ipa " xml:lang =" en-US ">

<lexeme ><grapheme >Sepulveda</ grapheme ><phoneme>sə ˈ̍̍̍pȜȜȜȜlv ǺǺǺǺdə</ phoneme>

</ lexeme >

<lexeme ><grapheme >W3C</grapheme ><alias >World Wide Web Consortium</ alias >

</ lexeme >

</ lexicon >

PLS 1.0 – Used for TTS

SSML 1.0<?xml version="1.0" encoding="UTF-8"?><speak version="1.0" … xml:lang="en-US">

<lexicon uri="http://www.example.com/SSMLexample.pl s"/>The title of the movie is: " La vita è bella " (Life is beautiful),which is directed by Benigni .

</speak>

PLS 1.0<?xml version="1.0" encoding="UTF-8"?>

<grapheme> La vita è bella </grapheme>

<phoneme> ˈ̍̍̍l ǡǡǡǡ ˈ̍̍̍vi ːːːːȎȎȎȎə ˈ̍̍̍ȤȤȤȤeǺǺǺǺ ˈ̍̍̍bǫǫǫǫl ə</phoneme></lexeme>

<grapheme> Benigni </grapheme>

<phoneme>bǫǫǫǫ ˈ̍̍̍ni ːːːːnji</phoneme></lexeme>

</lexicon>

PLS 1.0 – Used for ASR

SRGS 1.0<?xml version="1.0" encoding="UTF-8"?><grammar version="1.0“ xml:lang="en-US" root="movies " mode="voice">

<one-of><item>Terminator 2: Judgment Day</item> <item>Pluto's Judgement Day</item>

</one-of> </rule>

</grammar>

PLS 1.0<?xml version="1.0" encoding="UTF-8"?>

<grapheme> judgment </grapheme>

<grapheme> judgement </grapheme>

<phoneme> ˈ̍̍̍dʒȜȜȜȜdʒ.mənt</phoneme></lexeme>

</lexicon>http://www.w3.org/TR/pronunciation-lexicon/

Examples of Use

Multiple pronunciations for the same orthography

Multiple orthographies

Homophones

Homographs

Acronyms, Abbreviations, etc.

Detailed descriptions can be found in:W3C specification, WikipediaPaolo Baggia, SpeechTEK 2008 & Voice Search 2009

PLS 1.0 – Open Issues

No wide support of IPA in speech engines � Slowly changes are under way

� Phonetic Alphabet Registry will open doors to other alphabets in a controlled and interoperable way

Integration in ASR/TTS� SSML 1.1 will interoperate with PLS 1.0

� SRGS 1.0 still missing support of role attribute for PLS 1.0

No matching algorithm inside PLS, because it is mainly a data format

Pronunciation AlphabetsIPA, SAMPA

International Phonetic Alphabet

Pronunciation is represented by a phonetic alphabet� Standard phonetic alphabets

International Phonetic Alphabet (IPA)

� Well known phonetic alphabetSAMPA - ASCII based (simple to write)Pinyin (Chinese Mandarin), JEITA (Japanese), etc.

� Proprietary phonetic alphabets

International Phonetic Alphabet (IPA)� Created by International Phonetic Association (active since 1896),

collaborative effort by all the major phoneticians around the world

� Universally agreed system of notation for sounds of languages� Covers all languages

� Requires UNICODE to write it

� Normatively referenced by PLS

IPA – Chart

IPA was founded in 1886It is the major international

association of phoneticiansThe IPA alphabet provides

symbols making possible the phonemic transcription of all known languages

IPA characters can be encoded in Unicode by supplementing ASCII with characters from other ranges, particularly:

IPA extensions (0250–02AF)

Latin Extended-A (0100-017F)

See the detailed: http://www.unicode.org/charts

Phonetic Alphabets – Issues

The real problem is how to write pronunciation in a reliable, unless

you are trained phonetician

Issues with fonts and authoring, browsers, but Unicode fonts today

support IPA extensions, see:

� http://www.phon.ucl.ac.uk/home/wells/phoneticsymbols.htm

There are very few tools to help writing pronunciations and to let

you listen to what you have written

� Make available pronunciations in IPA or other general phonetic

languages.

Voice Dialog languages:VoiceXML 2.0VoiceXML 2.1

VoiceXML 2.0 – Features, Elements

Menus, forms, sub-dialogs<menu>, <form> , <subdialog>

InputSpeech recognition<grammar>

Recording<record>

Keypad<grammar mode="dtmf">

OutputAudio files<audio>

Text-To-Speech<prompt>

Variables (ECMA-262)<var> , <assign> , <script>

scoping rules

Events<nomatch> , <noinput> , <help>,

Transition and submission<goto> , <submit>

TelephonyConnection control<transfer> , <disconnect>

Telephony informationPlatform specifics

PerformanceFetchProperties

http://www.w3.org/TR/voicexml20/

VoiceXML 2.0 – Execution Model

Execution is synchronous� Only disconnect event is handled (somewhat) asynchronous

Execution is always in a single dialog: <form> or <menu>� Form Interpretation Algorithm for <field> selection

Prompt are queued� Played only when encountering a waiting state� Played before a fetchaudio is started

Processing is always in one of two states:� Waiting for input in an input item:

<field> , <record> , <transfer> , etc.� Transitioning between input items in response of an input

Event-driven:� <nomatch> , <noinput> user’s input event handling� <catch> , <throw> generalized event mechanism� connection.* call event handling� error.* error event handling

VoiceXML 2.1 – Extended Features

Dynamically referencing grammars and scripts:<grammar expr="…"> , <script expr="…">

Record user’s utterance during form fillingrecordutterance propertyAdd new shadow variables: recording , recordingsize , recordingduration

Detect barge-in during prompt playback (SSML <mark> )Add markexpr attributeAdd new shadow variables: markname and marktime

Fetch XML data without transitionUse read-only subset of DOM

Dynamically concatenate prompts <foreach>

Iterate throught ECMAScript arrays and execute content

Send data upon disconnect<disconnect namelist="…">

Additional transfer type<transfer type="consultation"> http://www.w3.org/TR/voicexml21/

VoiceXML Applications

Static VoiceXML applications� The VoiceXML page is always the same, so the user experience

� No personalization or customization

Dynamic VoiceXML applications� User experience is customized

• After authentication (PIN) • Using caller-id or SIP-id

� Data driven

� Dynamic pages generated at runtimee.g. JSP, ASP, etc.

http://www.w3.org/TR/voicexml21/http://www.w3.org/TR/voicexml20/

A Drawback of VoiceXML 2.0

A drawback of VoiceXML is that the transition from a VoiceXML page to another is a costly activity:� Fetch the new page, if not cached� Parse the page

� Initialize the context, possibly loading and initializing a new application root document

� Load or pre-compile scripts

The transitions are the only way to return data to the Web Application(if the VoiceXML is dynamic)

Pages must be created to include dynamic data

� VoiceXML 2.1 addresses part of this drawback by feeding dynamic data to a running VoiceXML page

Advantages of VoiceXML 2.1 - AJAX

Two of the eight new features in VoiceXML 2.1 helps to createmore dynamic VoiceXML applications:� <data> element� <foreach> element

Static VoiceXML document can fetch user-specific data at runtime, without changing the VoiceXML document<data> element allows retrieval of arbitrary XML data without VoiceXML document transitionsReturned XML data are accessible by a subset of DOM primitives<foreach> extend the prompts to allow the iteration on a dynamic array of information to create a dynamic prompt

This is similar to AJAX programming for HTML servicesIt decouples presentation layer (VoiceXML) from business logic (accessed via <data> )

VoiceXML 2.1 – <data> Element

Attributes:� name the variable to be filled with the DOM of the retrieved data

� scr or srcexpr the URI of the location of the XML data

� namelist the list of variables to be submitted� method either ‘get ’ or ‘post ’

� enctype media encoding

� fetch and caching attributes

As <var> , it may appear in executable content (<form> and <vxml> )

The value of name must be a declared variableThe platform will fill the variable of the DOM of the fetched XML data<data> element is synchronous (the service stops to get data)

VoiceXML 2.1 – <foreach> Element

Attributes:� array ECMAScript expression that must evaluate to ECMAScript array

� item the variable that stores the element to be processed

<foreach> allows the application to iterate on an ECMAScript array and to execute the content<foreach> may appear:� In executable content (all executable content elements may appear as

content of <foreach> )� In <prompt> (restrictions on the content are applied)

<foreach> allows sophisticated concatenation of prompts

VoiceXML – Final Remarks

The changed landscape for speech application development:� Virtually all the IVRs today support VoiceXML

� New options related to VoiceXML:� SIP-based VoiceXML platforms (Loquendo, Voxpilot, Voxeo, VoiceGenie)

� Large hosting of speech applications (TellMe, Voxeo)

� Development tools (VoiceObjects, Audium, SpeechVillage, Syntellect, etc.)

� Further changes may come from the CCXML adoption

… but:� Mainly system driven applications are actually deployed

� New challenges to incorporate more powerful dialog strategies,mixed-initiative are under discussion.

VoiceXML Resources

Voice Browser Working Group (spec, FAQ, implementations, resources):http://www.w3.org/Voice/

VoiceXML Forum site (resources, education, interest groups):http://www.voicexml.org/

VoiceXML Forum Review:http://www.voicexmlreview.org/

Interesting articles related to VoiceXML and moreExample code in the sections "First Words" and "Speak & Listen"

Ken Rehor’s World of VoiceXMLhttp://www.kenrehor.com/voicexml

Online documentation related to VoiceXML PlatformsLoquendo Café, Voxeo (http://www.vxml.org/ ), TellMe, VoiceGenie

Many books on VoiceXML:Jim Larson, "VoiceXML Introduction to Developing Speech Applications", Prentice-Hall,

2002.A. Hocek, D. Cuddihy, "Definitive VoiceXML", Prentice-Hall, 2002

Call Control:CCXML 1.0

CCXML 1.0 – Highlights

Asynchronous event processing

Acceptance or refusal of an incoming call

Different type of transfer call management

Outbound call activation (interaction with an external entity)

Use of ECMAScript adding scripting capabilities to call control applications

VoiceXML modularization

Conferencing management

CCXML 1.0 – Elements Relationship

CCXML 1.0 – Incoming Call

Event catching and processing

connection.alertingCCXML

Interpreter

<?xml version="1.0"encoding="UTF-8"?>

CCXML document

[…]</transition>

event$

name:’connection.alerting’;connectionid:‘0239023901903993’;eventid:’00001’; ....…..

http://www.w3.org/TR/ccxml

CCXML 1.0 – connection.alerting Event

Basic telephony information has been retrieved on alerting event and is available into CCXML document:Local URI, remote URI, protocol used, redirection info, etc.

Based on certain checked info, CCXML can accept or refuse the incoming call, even before contacting the dialog server;

Any error that can occur during the phone call can be managed byCCXML service (connection.failed , error.connection events)

Call Control Adapter

CCXML Interpreter

VoiceXMLInterpreter

connection.alerting

Analyzing events$ content<accept/> | <reject/>

CCXML 1.0 – How to activate a new dialog

CCXML actions:� Receives alerting event from Call Control Adapter� Asks to dialog server to prepare a new dialog� Waits for the preparation� If the dialog has been successfully prepared, accept the call � Asks to dialog server to start the prepared new dialog

CCXML Interpreter

VoiceXMLInterpreter

alerting

prepare a new dialog

dialog preparedcall accepted

start the prepared dialog

dialog started

connected

Call transfer

CCXML supports transfer call of different modality: "bridge ", "blind ", "consultation ";

Based on different modalities features CCXML language allows the expected interaction with the Call Control Adapter to correctly perform the transfer;

During the different phases of transfer call creation the CCXML can receive any asynchronous event and correctly manage it, interrupting the call, if requested

CCXML Interpreter

VoiceXMLInterpreter

Performing a transfer

command1

answer1

[…]transfer complete …

External Events

CCXML Interpreter Context can receive events from an external entity able to use the HTTP protocol; Events generated in this way must be sent to a CCXML by a POST HTTP commandA event is so performed and: � It can be addressed on a new session whose creation must be requested� It can be addressed on an existent session, specifying the ID in the

requestCCXML

Interpreter

basic http event

External Entity

Event management

Event management result

External event on a new session: the Outbound Call

A particular request arrived to Call Control from an external entity;A particular CCXML service associated with the received event is started and a set of operations between Call Control Adapter, Call Control and Dialog Server is activated: the outbound call is so placed

CCXML Interpreter

outbound call request

VoiceXMLInterpreter

prepared

connection progressing …

Start the prepared dialog

Create a call

Prepare a dialog

connection connected

External event on a session:dialog termination request

An external entity performs a HTTP POST request towards the CCXML Interpreter Context, specifying a sessionid, requesting the termination of a particular dialog;The CCXML check the session id, if this is valid then CCXML Interpreter injects the event received in the session; The CCXML service has a transition on that event and performs the dialog termination on a particular dialog identifier;

CCXML Interpreter

Dialog termination request

dialog.exit

dialogterminate (dialogid)

VoiceXMLInterpreter

disconnect(connId) dialogprepare

It depends on dialog.exit eventmanagement

Loading different CCXML documents:<fetch> and <goto> elements

<fetch> and <goto> elements are used respectively to asynchronously fetch content identified by the attributes of the <fetch> and to go in a fetched document, if it’s successfully loaded;

CCXML Interpreter

<fetchnext="'http://../Fetch/doc1.ccxml'" type="'application/ccxml+xml'" fetchid="result"/>

fetch the document "doc1.ccxml"

fetch.done / error.fetch

goto into the new document /continue to work on the same dialog

The first event occurred in a new documentis ccxml.loaded

- MODULARIZATION - SOURCE EXEMPLIFICATION- MORE READABILITY

Simple CCXML Document

<?xml version="1.0" encoding="UTF-8"?><ccxml version="1.0" xmlns="http://www.w3.org/2002/09/ccxm l">

</transition><transition event=" connection.connected ">

</transition><transition event=" dialog.started ">

</transition></eventprocessor>

</ccxml>

CCXML 1.0 – Next Steps

CCXML specification is a Last Call Working Draft, all the feature requests and clarifications have been addressed;

An Implementation Report test suite is under development;

It is very close to be published as W3C Candidate Recommendation;

Internal or external companies will be invited to send implementation report on their CCXML platform;

After that, CCXML 1.0 specification will be able to become Proposed Recommendation and then W3C Recommendation.

Speech Interface FrameworkTour Complete!

DialogManager

WorldWideWeb

TelephoneSystem

MediaPlanning

LanguageGeneration

N-gram Grammar ML

VoiceXML 2.0

Speech Interface Framework - End of 2009 (by Jim Larson)

Architectural Changes

User VoiceXML Browser

ASR / DTMF

TTS / Audio

Web Applic.HTTP

VoiceXML architecture

.grxml/.gram, .pls

.ssml, .wav/.mp3, .pls

VoiceXMLplatform

VoxNauta – Internal Architecture

Loquendo MRCP Server/LSS 7.0 Architecture

RTSP Parser

TTS and ASR API

RTSP(MRCPv1)

MRCP v1/v2 ServerAPAPI

AudioProvider

SDPMRCP v1 Parser

APInterf.

TTS & ASR interface

Config

Logger

Management(SNMP)

Configuration files

Log files

Win32/Linux

TTS and ASR API

LTTS LASR

NLSML / EMMA

SIP(SDP)

Load Balancer

MRCP v2 parser

LASR-SV

MRCP v2

GraphicManagement

Consolle

Media Resource Control Protocol MRCP are IETF standards� MRCPv1 is RFC 4463, http://www.ietf.org/rfc/rfc4463.txt, based on

RTSP/RTP� MRCPv2 is Internet Draft,

http://tools.ietf.org/html/draft-ietf-speechsc-mrcpv2-17, based on SIP/RTPoffering the new audio recording and Speaker Verificationfunctionalities

Optimized client-server solution for the large-scale deployment of speech technologies in the telephony field, such as call centers, CRM, news and email-reading, self-service applications, etc.Allows standard interface of speech technologies in all IVR platforms

IETF MRCP Protocols

For more information read:Dave Burke, Speech Processing for IP Networks. Media

Resource Control Protocol (MRCP), ed. Wiley

Fixed/MobileNetwork

OptionalVoice Gateway for

Non SIP PBX

VOXNAUTA IVR

CTIServer

DataServer

Operators

WEBServer

VoiceXML in a Call Center

Fixed/MobileNetwork

VOICE GATEWAY

VOXNAUTA MRF

IPNetwork

VoiceXML in the IMS Architecture

Application Server

TDM protocols

SIP protocolsRTPVoiceXML on HTTPS

Overview

� W3C Speech Interaction Framework Today� ASR/DMTF� TTS� Lexicons� Voice Dialog and Call Control� Voice Platforms and Next Evolutions

W3C Multimodal Interaction TodayMMI ArchitectureEMMA and InkMLA language for Emotions

� Next Future

Modes, Modalities and Technologies

� Speech � Audio� Stylus� Touch� Accelerometer� Keyboard/keypad� Mouse/touchpad� Camera� Geolocation� Handwriting recognition� Speaker verification� Signature verification� Fingerprint identification� ….

Complement and Supplement

Speech Visual- Transient - Persistent- Linear - Spatial- Hands and Eyes-Free - Eyes- Suffers Noise - Suffers Light Conditions

��Enable to choose among different modalities or to mi x them

��Adaptable to different social, environmental conditio ns or to user preference

GUI VUI MUIor

InteractionManager

Speakerverification

Faceidentification

Audiorecording

fingerprintfingerprint

drawingdrawing

videovideophotographphotograph

Vitalsigns

geolocationgeolocation

speechspeech

texttext

mousemouse

handwritinghandwriting

accelerometeraccelerometer

User intentSensor

Recording

Identification

MMI has an Intrinsic Complexity

Deborah Dahl, Voice Search 2009

MMI can Include Many Different Technologies

Interaction Manager

Speechrecognition

Handwritingrecognition

Accelerometer

Geolocation

Touchscreen

KeypadFingerprintrecognition

Getting everything to work together is complicated.

One simplification is to represent the same information from different modalities in the same format.

The need a common language for representing the same information from different modalities

� EMMA (Extensible MultiModal Annotation) 1.0A uniform representation for multimodal information

Uniform Representation for MMI

Interaction Manager

Speechrecognition

Handwritingrecognition

Accelerometer

Geolocation

Touchscreen

KeypadFingerprintrecognition

EMMAEMMA

EMMA Structural Elements

Provide containers for application semantics and for multimodal annotation

<emma:emma …><emma:one-of>

<emma:interpretation>…

</emma:interpretation> <emma:interpretation>

…</emma:interpretation>

</emma:one-of></emma:emma>

emma:emma

EMMA Elements

emma:lattice

emma:interpretation

emma:one-of

emma:sequence

emma:group

http://www.w3.org/TR/emma/

EMMA Annotations

Characteristics and processing of input, e.g.:

emma:hook

emma:medium emma:modeemma:function

emma:start emma:end

emma:source

emma:confidence

emma:media-type

emma:signal

emma:lang

emma:uninterpreted

emma:no-input

emma:process

emma:tokens

Timestamps (absolute/relative)

media type

uninterpretable input

lack of input

token of input

medium, mode, and function of input

annotation of input source

confidence scores

reference to signal

human language of input

reference to processing

http://www.w3.org/TR/emma/

EMMA 1.0 – Example Travel Application

INPUT:"I want to go from Boston to Denver on March 11"

http://www.w3.org/TR/emma/ Deborah Dahl, Voice Search 2009

EMMA 1.0 – Same meaning

<emma:interpretation medium="acoustic" mode="voice" id="int1">

<origin>Boston</origin>

<destination>Denver</destination>

</emma:interpretation>

<emma:interpretation medium="tactile" mode="gui“id="int1">

Speech

EMMA 1.0 – Handwriting Input

<emma:interpretation medium="tactile" mode="ink" id="int1">

EMMA 1.0 – Biometrics Input

<emma:emma version="1.0"> <emma:interpretation

id="int1"emma:confidence=".75"emma:medium="visual" emma:mode="photograph" emma:verbal="false" emma:function="identification">

<person>12345</person> <name>Mary Smith</name>

</emma:interpretation> </emma:emma>

<emma:emma version="1.0"> <emma:interpretation

id="int1"emma:confidence=".80"emma:medium="acoustic" emma:mode="voice" emma:verbal="false"

emma:function="identification"> <person>12345</person> <name>Mary Smith</name>

</emma:interpretation> </emma:emma>

EMMA 1.0 – Representing Lattices

Speech recognizers, Handwriting recognizers and other input processing components may provide lattice output:

A graph encoding a range of possible recognition results or interpretations

flights to fromplease

boston

austinportland

oakland

tomorrow1 2 3 54 6

http://www.w3.org/TR/emma/ From Michael Joshnston, AT&T Research

Lattices can be represented using EMMA elements:<emma:lattice emma:initial="?" emma:final="?">

<emma:arc emma:from="?" emma:to="?">

<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"><emma:interpretation><emma:lattice emma:initial="1" emma:final="8">

<emma:arc emma:from="1" emma:to="2">flights</emma:ar c><emma:arc emma:from="2" emma:to="3">to</emma:arc><emma:arc emma:from="3" emma:to="4">boston</emma:arc ><emma:arc emma:from="3" emma:to="4">austin</emma:arc ><emma:arc emma:from="4" emma:to="5">from</emma:arc><emma:arc emma:from="5" emma:to="6">portland</emma:a rc><emma:arc emma:from="5" emma:to="6">oakland</emma:ar c><emma:arc emma:from="6" emma:to="7">today</emma:arc><emma:arc emma:from="7" emma:to="8">please</emma:arc ><emma:arc emma:from="6" emma:to="8">tomorrow</emma:a rc>

</emma:lattice></emma:interpretation></emma:emma>

EMMA 1.0 – Representing Lattices

http://www.w3.org/TR/emma/ From Michael Joshnston, AT&T Research

EMMA in Multimodal Frameworkhttp://www.w3.org/TR/mmi-framework

InkML 1.0 – Digital Ink

Ink Markup Language (InkML), http://www.w3.org/TR/InkML

Data format for presenting digital Ink (pen, stylus, etc)Allows the input and processing of handwritings, gesture, sketches, music, etc.

10 0, 9 14, 8 28, 7 42, 6 56, 6 70, 8 84, 8 98, 8 1 12, 9 126, 10 140,13 154, 14 168, 17 182, 18 188, 23 174, 30 160, 38 147, 49 135,58 124, 72 121, 77 135, 80 149, 82 163, 84 177, 87 191, 93 205

</trace><trace>

130 155, 144 159, 158 160, 170 154, 179 143, 179 12 9, 166 125,152 128, 140 136, 131 149, 126 163, 124 177, 128 19 0, 137 200,150 208, 163 210, 178 208, 192 201, 205 192, 214 18 0

</trace><trace>

227 50, 226 64, 225 78, 227 92, 228 106, 228 120, 2 29 134,230 148, 234 162, 235 176, 238 190, 241 204

</trace><trace>

282 45, 281 59, 284 73, 285 87, 287 101, 288 115, 2 90 129,291 143, 294 157, 294 171, 294 185, 296 199, 300 21 3

</trace><trace>

366 130, 359 143, 354 157, 349 171, 352 185, 359 19 7,371 204, 385 205, 398 202, 408 191, 413 177, 413 16 3,405 150, 392 143, 378 141, 365 150

</trace></ink>

http://www.w3.org/TR/InkML/

InkML 1.0 – Status and Advances

Rich annotation for Ink:Trace, Trace formats and Trace collections

Contextual information

CanvasesEtc.

Result of classification of InkML traces may be a semantic representation in EMMA 1.0

Current status is Last Call Working Draft, next will be Candidate Recommendation with release of an Impl. Report test-suiteRaising interest from major industries

http://www.w3.org/TR/InkML/

MMI Architecture Specification

“Multimodal Architecture and Interfaces“, W3C Working Draft,http://www.w3.org/TR/mmi-arch/

Runtime Framework provides the basic infrastructure and controls communication among the constituents. Interaction Manager (IM) coordinates Modality Components (MCs) by life-cycle events and contains the shared data (context).Event-based communication between IM and MCs.

Modality Component 1

Modality Component N

Runtime Framework

Data Component

InteractionManager

DeliveryContext

Component

Modality Component API

Ingmar Kliche, SpeechTEK 2008http://www.w3.org/TR/mmi-arch/

MMI Arch – Laboratory Implementation

Implementation of components using W3C markup languages.

Modality Component 1

Modality Component N

Runtime Framework

Data Component

InteractionManager

DeliveryContext

Component

Modality Component API Modality Component API

HTMLfor GUI

VoiceXMLfor VUI

MMI Arch – Laboratory Implementation

SCXML based Interaction Manager.VoiceXML + HTML modality components.

HTML Browser

CCXML/VoiceXMLBrowser

Modality Component API: HTTP + XML (using AJAX) Modality Component API: HTTP + XML (EMMA)

Server

ClientPhone Client

Server

Telephony interface

GUI modality component Voice modality component

HTTP I/O Processor

SCXML interpreter

MMI Architecture – Open Issues

Profiles

Start-up, Registration, Delegationin distributed environment

Transport of Events

Extensibility of Events

http://www.w3.org/TR/mmi-arch/

Emotion in Wikipedia

From Wikipedia definition:

“An emotion is a mental and physiological state associated with a wide variety of feelings, thoughts, and behaviours. It is a prime determinant of the sense of subjective well-being and appears to play a central role in many human activities. As a result of this generality, the subject has been explored in many, if not all of the human sciences and art forms. There is much controversy concerning howemotions are defined and classified.”

General goal: Make interaction between humans and machines more natural for the humans

Machines should become able:• to register human emotions (and related states)

• to convey emotions (and related states)

• to “understand” the emotional relevance of events

adventurous

triumphant

lusting

ambitious conceited

bellicose

self-confident

courageous feeling superior

convinced

light-hearted

enthusiastic

determined amused

passionate

expectant

elated

interested

joyous

excited

hostile

envioushateful

enraged defiant

contemptuousjealous

disgusted

loathingindignant

impatientsuspicious

distrustful

startled

insulted

bitterdiscontented

feel well impressed disappointed

EXCITED �

amourous astonished apatheticdissatisfied

confidenttakenabackcontent hopeful

relaxedlonging

solemn attentive

worried

uncomfortabledespondent

feel guilt

languid ashamed desperate

friendlycontemplative

pensive embarrassed

polite serious

conscientious

peaceful

reverentempathic

melancholic

hesitantwavering

anxious

lonely

doubtful

sad dejected insecure

DEPRESSED �

ASTONISHED

AROUSED

DELIGHTED

HAPPY �

PLEASED �GLAD �

SERENE �

CONTENT � AT EASE �SATISFIED � RELAXED

� CALM �

SLEEPY �

� TENSE�

ALARMED� ANGRY � AFRAID

ANNOYED �

DISTRESSED

FRUSTRATED �

MISERABLE �

� SAD

GLOOMY

� TIRED

� BORED

DROOPY �

Emotional States are Numerous

Scherer et al.Univ. Geneva

Hi Power/Control

Conducive

Obstructive

Lo Power/Control

Active

Positive Negative

Passive

HUMAINE project

HUMAINE projectEuropean Network of ExcellenceActivity: 01/2004 - 12/200733 partner institutions from many disciplines

Today: HUMAINE Association (since June 2007)125 membersWeb-site: http://emotion-research.net

Online Speaker Classification

Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) –preprocessing step to reduce feature vector dimensionK-nearest NeighborGaussian Mixture Models : model training data as Gaussian densitiesArtificial Neural Networks (ANN), e.g. MLP: interesting training algorithms

Support Vector Machines (SVM) : use “kernel functions” to separate non-linear decision boundariesClassification and Regression Trees (CART)Hidden Markov Models (HMMs) used to model temporal structure

Felix Burkhardt, Colloqium Hochschule Zittau/Görlitz 4 .8.2008, Seite 1.

Classification Techniques

Text+expressive tags

Selection

style 1

style 2

style n

Waveform

1. Different speech databases, one for each expressive style:

� Effective solution, feasible only for a very limited range of emotions

2. Speech signal manipulation according to style dependent prosodic models

� Flexible solution, but requires accurate models and effective signal processing capabilities

Selection

Signal Processing

Prosodic Model

Text+expressive tags

Waveformneutral style

Expressive TTS – Two Approaches

From Enrico Zovato, Loquendo

Expressive TTS – Example Prosodic Patterns

Time (s)0 1.8

POS (“happy”)

NEG (“sad”)

Synthesis of two basic emotional styles through prosodic modification:

� different intonation contours

� different acoustic units duration

Male-UK

POSNEG

Female-UKFrom Enrico Zovato, Loquendo

Emotions in ECAs

From Piero Cosi, CNR, Padova

W3C Emotion Incubator

“The W3C Incubator Activity fosters rapid development, on a time scale of a year or less , of new Web-related concepts. Target concepts include innovative ideas for specifications, guidelines, and applications that are not (or not yet) clear candidates as Web standardsdeveloped through the more thorough process afforded by the W3C Recommendation Track.”

W3C Emotion Incubator Aims:First Charter XG (2006-2007):

“...to investigate the prospects of defining a general-purpose Emotion annotation and representation language...”“...which should be usable in a large variety of technological contexts where emotions need to be represented.”

Second Charter XG (Nov. 2007 – Nov. 2008):Prioritize the requirements; Release a first specification draft; Illustrate how to combine the Emotion Markup Language with existing markup languages.

W3C Emotion Incubator – Members

W3C Members:DFKI

LoquendoDeutsche Telekom

SRI International

NTUAFraunhofer

Chinese Acad. Science

Invited Experts:Emotion AI

Univ. Paris 8Uuniv. Basque Country

Univ. C. Cork

OFAI, AustriaIPCA, Portugal

Tech.Univ. Munich

Web space: http://www.w3.org/2005/Incubator/emotion

Results:• Use case description document• Requirements document• Final Report (20 Nov 2008): Elements of an EmotionML 1.0� http://www.w3.org/2005/Incubator/emotion/XGR-emotio nml/

Chairman: Marc Schröder, DFKI

W3C Emotion Incubator – EmotionML 1.0

Document structure: container element (<emotionml> ), single emotion annotation (<emotion> )

Representation of emotions:<category> element, <dimensions> element, <appraisals> element,

<action-tendency> element, <intensity> element

Meta information:confidence attribute, <modality> element, <metadata> element

Links and time:<link> element, <timing> element

Scale valuesvalue attribute, <traces> element

EmotionML 1.0 – Examples

Expression of emotions in SSML 1.1:

<?xml version="1.0"?><speak version="1.1" xmlns="http://www.w3.org/2001/ 10/synthesis"

xmlns:emo="http://www.w3.org/2008/11/emotionml"xml:lang="en-US">

<emo:category set="everydayEmotions" name="doubt"/><emo:intensity value="0.4"/>

</emo:emotion>

Do you need help?</s>

</speak>

Detection of emotions in EMMA 1.0:

<emma:emma version="1.0" xmlns:emma="http://www.w3.o rg/2003/04/emma"xmlns="http://www.w3.org/2008/11/emotionml" >

<emma:interpretation start="12457990" end="12457995" mode="voice" verbal="false">

</emotion>

</emma:interpretation></emma:emma>

Overview

� W3C Speech Interaction Framework Today� ASR/DMTF� TTS� Lexicons� Voice Dialog and Call Control� Voice Platforms and Next Evolutions

� W3C Multimodal Interaction Today� MMI Architecture� EMMA and InkML� A language for Emotions

Next Future

W3C VBWG/MMIWG – Next Future

Spec for the next generation of Voice Browsing

SCXML 1.0

VoiceXML 3.0

State Charts - SCXML

State Chart XML (SCXML): http://www.w3.org/TR/2008/WD-scxml-20080516/

Powerful State-Machine LanguageBased on David Harel’s State Charts (see his book)

Adopted by in UMLStandard under development by W3C VBWG

http://www.w3.org/TR/scxml/

States, Transitions, EventsData model extends basic finite state automatonConditions on transitions

Nested StatesRepresents task decompositionIn multiple dependent states at same time

Parallel StatesRepresent fork/join logic

Wide interest:VBWG, MMI WG, Other W3C groups, Universities, IndustriesAlready available Open Source Implementations

SCXML 1.0 – Parallel State Charts

SCXML as MMI Interaction Manager

Voice Modality

Visual Modality

Gesture Modality

SCXML Interaction Manager

SCXML for VoiceXML 3.0

Voice Modality

Visual Modality

Gesture Modality

SCXML Interaction Manager

SCXML 1.0 – Open Issues

Data model:ECMA Script (ECMA-262) or other formats?

Definition of Profiles

Re-Thinking VoiceXML – VoiceXML 3.0

Well-founded:From syntactic description to a semantic model

Extensible:SIV, EMMA support, rich media, VCR control, etc.

Profiled:light profile (mobile?), media profile (scalability), VoiceXML 2.1 profile (interoperability), etc.

Flexibility:Customization of FIA (Form Interpretation Algorithm)

VoiceXML 3.0 – Separation of Concerns

SCXML 1.0Application and interaction logic

VoiceXML 3.0:Voice Interaction only, under control of SCXML

VoiceXML 3.0 has been published as a First Working Draft, http://www.w3.org/TR/2008/WD-voicexml30-20081219/� Send public comments

THANK YOUTHANK YOU

for clarifications or questions:

paolo.baggia@loquendo.com

For more information please:Keep an eye on: www.loquendo.com

Contact: paolo.baggia@loquendo.com

Loquendo S.p.A.Loquendo S.p.A.745 Fifth Ave, 27th Floor New York, NY 10151USATel. +1 212.310.9075Fax. +1 212.310.9001www.loquendo.com

THANK YOUTHANK YOU

Loquendo S.p.A.Loquendo S.p.A.Via Olivetti, 610148 TORINOItalyTel. +39 011 291 3111 Fax +39 011 291 3199www.loquendo.com

Keep in touch with Loquendo news, subscribe to the Loquendo Newsletter

Try our interactive TTS demo : insert your text, choose a language, and listen

The latest News at a click

Consult the Loquendo Newsletter online

Keep up to date on events and initiatives

For further information, fill in our Contacts Form