Evalita Workshop 2009 1Paolo Baggia 1
EVALITA 2009: Loquendo Spoken Dialog System
Paolo BaggiaDirector of International StandardsSpeech Luminary at SpeechTEK 2009
Evalita WorkshopDecember 12th, 2009
Evalita Workshop 2009 2Paolo Baggia
Company Profile
� Privately held company (fully owned by Telecom Italia), founded in 2001 as spin-off from Telecom Italia Labs, capitalizing on 30yrs experience and expertise in voice processing.
� Global Company, leader in Europe and South America for award-winning, high quality voice technologies (synthesis, recognition, authentication and identification) available in 26 languages and 62 voices.
� Multilingual, proprietary technologies protected over 100 patents worldwide
� Financially robust, break-even reached in 2004, revenues and earnings growing year on year
� Growth-plan investment approved for the evolution of products and services.
� Offices in New York. Headquarters in Torino, local representative sales offices in Rome, Madrid, Paris, London, Munich
� Flexible: About 100 employees, plus a vibrant ecosystem of local freelancers.
Torino
Rome
Madrid
Paris
London
New York
Munich
Evalita Workshop 2009 3Paolo Baggia
Overview
A Bit of Context
Loquendo Spoken Dialog SystemLoquendo VoxNauta VoiceXML/CCXML platformSDS – Ingredients Grammar HandlingEvaluation
Evalita Workshop 2009 4Paolo Baggia
A Bit of Context
Evalita Workshop 2009 5Paolo Baggia
A Brief History of Speech Technologies
1769176917691769
Von Kempelen'sTalking Machine
1920192019201920
Radio Rex
FFT, cepstrum, DTW, TTS
1952195219521952
BL's Audrey
1971197119711971
DARPA SUR program starts
Hidden Markov Models
1975197519751975
Dictation systems
1982198219821982
Dictation Industry
1985198519851985
Widespread use of HMMs
1988198819881988
DARPA Resource Management DARPA ATIS
1990199019901990
Continuous Speech
Recognition Speech Understanding
DialogDARPA COMMUNICATOR
1995199519951995
Spoken dialog Industry
Isolated Words
MultiModal
2000200020002000
BL's Voder
1936193619361936
NLU systems
Conversational systems
by Roberto Pieraccini
Evalita Workshop 2009 6Paolo Baggia
SSML 1.0 W3C RecSRGS 1.0
W3C Rec
1998
1999
2000
2002
2004
W3C Voice Browser
WorkshopVoiceXML 1.0
Released
VoiceXML Forum Birth
W3C charters Voice Browser
WG
W3C charters Multimodal Interaction
WG
SALT Forum Birth
VoiceXML 2.0 W3C Rec
By AT&T, IBM,Lucent, Motorola,
By Cisco, Comverse, Intel, Microsoft, Philips,SpeechWorks,
Preparing to announce VoiceXML 1.0Friday Feb. 25 th, 2000Lucent, Naperville, Illinois
Left to right: Gerald Karam (AT&T), Linda Boyer (IBM), Ken Rehor (Lucent), Bruce Lucas (IBM),Pete Danielsen (Lucent), Jim Ferrans (Motorola), Dave Ladd (Motorola).
The (r)evolution of VoiceXML1998 - 2004
SISR 1.0 W3C Rec
2007
VoiceXML 2.1 W3C Rec
2008
EMMA 1.0 W3C Rec
PLS 1.0W3C REC
2009
Evalita Workshop 2009 7Paolo Baggia
Speech Interface Framework in 2000 (by Jim Larson)
DialogManager
WorldWideWeb
TelephoneSystem
ContextInterpretation
MediaPlanning
LanguageGeneration
TTS
ASRLanguage
Understanding
DTMF Tone Recognizer
Pre-recorded Audio Player
Speech SynthesisMarkup Language (SSML)
Pronunciation LexiconSpecification (PLS)
Reusable Components Call Control XML(CCXML)
Semantic Interpretation forSpeech Recognition (SISR)
N-gram Grammar ML
Speech RecognitionGrammar Spec. (SRGS)
Natural LanguageSemantics ML
VoiceXML 2.0
VoiceXML 2.1 EMMA
User
Evalita Workshop 2009 8Paolo Baggia
DialogManager
WorldWideWeb
TelephoneSystem
ContextContextContextContextInterpretationInterpretationInterpretationInterpretation
MediaPlanning
LanguageGeneration
TTS
ASR
DTMF Tone Recognizer
Pre-recorded Audio Player
Speech SynthesisMarkup Language (SSML)
Pronunciation LexiconSpecification (PLS)
Reusable Components Call Control XML(CCXML)
Semantic Interpretation forSpeech Recognition (SISR)
N-gram Grammar ML
Speech RecognitionGrammar Spec. (SRGS)
Natural LanguageSemantics ML
VoiceXML 2.0
VoiceXML 2.1 EMMA 1.0
User
LanguageLanguageLanguageLanguageUnderstandingUnderstandingUnderstandingUnderstanding
Speech Interface Framework - End of 2009 (by Jim Larson)
Evalita Workshop 2009 9Paolo Baggia
Architectural Changes
User Speech Applic.
ASR / DTMF
TTS / Audio
Traditional (proprietary) architecture
ProprietarySCE
Proprietaryplatform
User VoiceXML Browser
ASR / DTMF
TTS / Audio
Web Applic.HTTP
VoiceXML architecture
.vxml
.grxml/.gram, .pls
.ssml, .wav/.mp3, .pls
VoiceXMLplatform
Evalita Workshop 2009 10Paolo Baggia
The Landscape Changed!
VoiceXML changed the landscape of speech application development From proprietary to standard-based speech applications
• Proprietary platforms(HW & SW)
• Proprietary applications (by proprietary SCE)
• Mainly DTMF and pre-recorded prompts
• First attempts to add speech into IVR
• Standard VoiceXMLplatforms
• Standards for SpeechTechnologies
• Standard tools forVoiceXML applications
• Integration of DTMFand ASR
• Still predominance ofDTMF, but more andmore speechapplications
Before After
VoiceXML Key Features:It takes the web paradigm to the core of speech applications developmentIt is a powerful abstraction – Easy to author
Evalita Workshop 2009 11Paolo Baggia
Loquendo Spoken Dialog System
Evalita Workshop 2009 12Paolo Baggia
VoxNauta – Internal Architecture
Evalita Workshop 2009 13Paolo Baggia
Loquendo SDS – Ingredients
Dialog Implementation:VoiceXML mostly static application (JSP for dynamic pages)
VoiceXML mixed initiative for multi slot inputBarge-in always present (to interrupt prompts and shift focus)
Data in mySQL DB
Speech Grammar Development:SRGS grammars no SLM
Mostly dynamic grammars (JPS generated)Wild exploitation of Garbage Rule (of SRGS)
Prompting:Pure TTS, untuned
� Development / Tuning: 1 week
Evalita Workshop 2009 14Paolo Baggia
Typical Garbage Topologies:
Speech Grammar with Garbage Rules
Garbage
GarbageGarbage
Garbage
Contentpart
Contentpart
Contentpart
Prefix
Postfix
“ (Well…I’m leaving…er…sorry) from Boston ”
“ at 5pm (please) ”
“ (I’d like to travel) from Rome to Venice (please) ”
� An attempt to explore limits/advantages in conversati onal dialogs
Evalita Workshop 2009 15Paolo Baggia
Loquendo SDS – Grammar Encapsulation
Robust Domain Concept Grammars:For Brands, Categories, Codes, Names
Component Grammars:First level of composition of domain concepts
Combinatorial Grammars:Combination of Components in different orders
Insertion of Garbage Rules
Evalita Workshop 2009 16Paolo Baggia
Loquendo SDS – Evaluation
Short Dialogs:Mixed initiative and flexible grammars were effective
Task Success Rate:Very high for implemented task
Some Tasks were not Implemented:Initial requirements a bit vague
Very precise testing scenarios
No last minute tuning of Loquendo SDS
Evalita Workshop 2009 17Paolo Baggia
Final Evaluation Table
63.5%
(73/115)
-62.2%
(56/90)
-58.4% (45/77)-Overall
(corr/req)
78.6%
(11/14)
3.5 ± 2.577.8%
(14/18)
2.8 ± 1.655.6%
(5/9)
2.3 ± 0.4Search single
product
44.4%
(4/9)
3.8 ± 1.625.0%
(2/8)
3.0 ± 0.80.0%
(0/4)
2.0 ± 0.0List products -
other
63.2%
(12/19)
7.5 ± 2.842.9%
(9/21)
4.3 ± 1.836.4%
(4/11)
4.6 ± 1.5New order
66.7%
(2/3)
3.0 ± 0.00.0%
(0/8)
2.0 ± 0.050.0%
(2/4)
2.0 ± 0.0List customers
75.0%
(3/4)
3.0 ± 0.080.0%
(4/5)
2.0 ± 0.00.0%
(0/8)
2.5 ± 1.5List orders
54.6%
(12/22)
3.4 ± 1.688.9%
(8/9)
2.3 ± 0.583.3%
(5/6)
2.0 ± 0.0Ask customer
detail
90.5%
(19/21)
3.1 ± 0.595.0%
(19/20)
2.4 ± 0.8100.0%
(19/19)
1.9 ± 0.4Identify
representative
Tsr
(corr/req)
Duration
(turns)
Tsr
(corr/req)
Duration
(turns)
Tsr
(corr/req)
Duratio
n
(turns)
UniTNLoquendoUniNATask
Evalita Workshop 2009 18Paolo Baggia
THANK YOUTHANK YOU
for clarifications or questions: