1
A Spoken Dialog System to Access a Newspaper Web Site
César González Ferreras (UVA)
Rubén San-Segundo Hernández (UPM)
Valentín Cardeñoso Payo (UVA)Universidad Politécnica
de Madrid
Dialog Systems based on XML Technologies
Berliner XML Tage 2004
2
Contents
Introduction and Related Work System Overview System Architecture Interaction Model Information Model Sample Interaction Conclusions and Future Work
3
Introduction
Provide vocal access to already existing Internet contents.
Advantages of Vocal Interaction over traditional visual only web browsing:– Speech is more natural for most of the people.– Suits for users with special needs (e.g. blind people)– Ideal for hands-free, eyes-busy environments.– Solution for mobile devices which allow web access
anytime anywhere, but still have limited displaying capabilities.
4
Introduction
Maturity of spoken dialog systems for accessing structured information stored in databases [La99, Zu00].
Textual information is massive and speech interface has some limitations (sequential and not persistent).
An efficient and natural way of interaction is required.
5
Related Work
Approaches to make web contents available using speech:– Add a vocal interface to an existing web browser,
[HT95, Ve03]. – Convert HTML contents into VoiceXML, [Go00,
FKL01].– Restrict the the solution to selected on-line
resources [La97, PCS03].– Extend a traditional Information Retrieval System
with a speech interface [Cr99, Ch02].
6
System Overview
Objective: develop a spoken dialog system to access a newspaper web site.
We use two strategies to access information:– Browse: review which information is available.– Query: specific information need.
To describe each strategy, we use two models:– Interaction model: describes how the system dialogs with the
user. – Information model: describes how the web contents must be
processed and structured in order to support that interaction.
7
Browse
Browse: the user does not have a specific information need and wants to know which information is available.– Interaction Model: The information must be
presented gradually, at different levels of detail.
– Information Model: The information must be organized in groups of items, and all the items in different levels of detail: first a headline, next a short description and finally all the information.
TREE
...
...
...
8
Query
Query: the user has a specific information need which he can express as a query.– Interaction Model:The system searches and
presents the results to the user.– Information Model: An inverted
index is used. It contains, for each term in the lexicon, a list of documents in which that term appears. We have used the vector space model, [SWY75].
INDEX
term1term2term3term4...
Doc1Doc2, doc3Doc4Doc3...
9
System Architecture
INFORMATION MODEL
TREE
INDEX
term1term2term3term4...
Doc1Doc2, doc3Doc4Doc3...
...
...
...
InternetInternet Crawler
InformationManager
LocalRepository
Dictionaries
DialogManager
VoiceXMLBrowser
10
System Architecture
Information Manager:– HTML pages are converted into XML using Tidy and XSLT.– Browsing tree is built (based on sections and news).– Inverted index is built.
Dialog Manager:– VoiceXML is used as language to describe dialogs.– Java Servlet technology (Tomcat).
VoiceXML Browser:– The system works for Spanish Language.– Our own VoiceXML interpreter.– Speech recognition and synthesis from Universidad
Politécnica de Cataluña.– Dialogic telephone card.
11
Interaction Model
System initiative strategy to control the dialog flow (Finite State Diagrams mapped into VoiceXML).
A large vocabulary (2000 words), dynamically generated, is divided in several smaller ones (50-100 words). Each of them associated with one state of the dialog (higher speech recognition rate).
The system uses two different confirmation strategies, depending on the size of the vocabulary (implicit <25, explicit >25).
The user can interrupt the system anytime: barge-in.
12
Interaction Model (Browse)
SECTION
BLOCK
SUMMARY
NEWS
back
back
<news>
body
<section>
Next
Previous
Next
Previous
back
13
Interaction Model (Query)
SECTION
TERM
SUMMARY
NEWS
back
back
<news>
body
<section>
Next
Previous
OPTIONS
<term> AND results>1 <term> AND results=1
back
14
Information Model
Built extracting information from the web site of a local newspaper (El Norte de Castilla).
Decision tree: – The contents of the newspaper are divided into
sections. – Each section contains several news stories. – Each news story is composed of several elements:
a headline, a short summary and a body.
15
Information Model (Browse)
SECTION1
BLOCK1
SUMMARY1
NEWS1
S2 S3
N2 N3
S4 S5
N4 N5
B2 B3
SelectBack
NextPrevious
NextPrevious
SECTION2
...
...
16
Information Model (Query)
Inverted index: – For each news story we
extract all the terms.– A stemmer is used to
remove affixes.– For each term we calculate
the weight using tf-idf.– We use the 25 most
relevant components of each news.
INDEX FOR SECTION3
term1term2term3term4...
Doc1Doc2, doc3Doc4Doc3...
INDEX FOR SECTION2
term1term2term3term4...
Doc1Doc2, doc3Doc4Doc3...
INDEX FOR SECTION1
term1term2term3term4...
Doc1Doc2, doc3Doc4Doc3...
17
Information Model (Query)
Term weight: term frequency-inverse document frequency (tf-idf). The following formula is used to compute the weight w of each term in the document.
– tf is the number of times the term occurs in the document.– df is the number of documents in which that term appears.– N is the number of documents in the collection:
Document collection: stories collected from the newspaper web site during more than a year (71,141 news).
w = (1 + log(tf )) logdfN
18
Sample Interaction (Browse)
System: Welcome to “El Norte de Castilla” newspaper. Remember that you can interrupt the system anytime. You can ask for help whenever you need it. Choose browse or query to access information.
User: Browse. System: What section do you want? local, Spain, international, life, culture or television. User: International System: International section. There are ten news. Block one. First: Left parties victory in
regional French elections. Second: Tunisia cancels the summit of the Arab League about political reforms. Third: New leader of Hamas in Gaza describes George Bush as “enemy of God and Islam”. Fourth: Karzai announces a delay until September of Afghanistan elections. Fifth: United States gives the keys of the first of the twenty five Iraqi ministries. Please, choose the news you want, next or back.
User: Fourth System: Headline: Karzai announces a delay until September of Afghanistan elections.
International Section. Summary: He justifies the delay because of ...
19
Sample Interaction (Query)
System: Welcome to “El Norte de Castilla” newspaper. Remember that you can interrupt the system anytime. You can ask for help whenever you need it. Choose browse or query to access information.
User: Query. System: What section do you want? local, Spain, international, life, culture or television. User: International System: International section. Please, tell me the query term. User: Elections System: I understood elections. Is it correct? User: Yes System: International Section. There are three news about elections. First: Left parties victory
in regional French elections. Second: Karzai announces a delay until September of Afghanistan elections. Third: At least 46 dead in the elections. Please, choose the news you want.
User: Second System: Headline: Karzai announces a delay until September of Afghanistan elections.
International Section. Summary: He justifies the delay because of ...
20
Conclusions
We have presented a system which allows speech access to a newspaper web site.
– Interaction Model combines browse and query mechanisms in order to allow the user access the information.
– Information Model supports that interaction using two data structures: a decision tree and an inverted index.
All the contents used by the system are automatically obtained from the web.
We used VoiceXML as a language to describe dialogs.
21
Future Work
We are working in the evaluation of the system performance and an user satisfaction.
We will study how users respond to the system and this will allow us to validate the adequacy of the models proposed to access the information.
22
QUESTIONS
23
References
[Ch02] Chang, E. et. al.: A System for Spoken Query Information Retrieval on Mobile Devices. IEEE Transactions on Speech and Audio Processing. 10(8). November 2002.
[Cr99] Crestani, F.: Vocal access to a Newspaper Archive: Design Issues and Preliminary Investigations. In: ACM Digital Libraries. 1999.
[FKL01] Freire, J.; Kumar, B.; Lieuwen, D. F.: WebViews: Accessing Personalized Web Content and Services. In: International World Wide Web Conference. 2001.
[Go00] Goose, S. et. al.: Enhancing Web Accessibility Via the Vox Portal and a Web Hosted Dynamic HTML & VoxML Converter. In: International World Wide Web Conference. May 2000.
[HT95] Hemphill, C. T.; Thrift, P. R.: Surfing the Web by Voice. In: ACM International Conference on Multimedia. 1995.
[La97] Lau, R. et. al.: WebGalaxy - Integrating Spoken Language And Hypertext Navigation. In: European Conference on Speech Communication and Technology (Eurospeech). 1997.
24
References
[La99] Lamel, L. et. al.: The Limsi Arise System For Train Travel Information. In: International Conference on Acoustic, Speech and Signal Processing (ICASSP). 1999.
[PCS03] Polifroni, J.; Chung, G.; Seneff, S.: Towards the Automatic Generation of Mixed-Initiative Dialogue Systems from Web Content. In: European Conference on Speech Communication and Technology (Eurospeech). 2003.
[SWY75] Salton, G.; Wong, A.; Yang, C. S.: A vector space model for automatic indexing. Communications of the ACM. 18(11). November 1975.
[Ve03] Vesnicer, B. et. al.: A Voice-driven Web Browser for Blind People. In: European Conference on Speech Communication and Technology (Eurospeech). 2003.
[Zu00] Zue, V. et. al.: JUPITER: A Telephone-Based Conversational Interface for Weather Information. IEEE Transactions on Speech and Audio Processing. January 2000.