Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | zena-contreras |
View: | 34 times |
Download: | 0 times |
Realizing the Interactive Speech Interface in
a Multi-user Virtual Environment
AdvisorTsai-Yen Li
AuthorChun-Feng Liao
NCCU Department of Computer ScienceIntelligent Media Lab
July 2004
Agenda
Introduction Related Work Dialog Management in MUVE
(Multi-user Virtual Environment) The Design of XAML-V Dialog
Scripting Language System Implementation and
Design Conclusion
More Abstract / High Level
More Concrete / Low Level
Agenda
Introduction Related Work Dialog Management in MUVE
(Multi-user Virtual Environment) The Design of XAML-V Dialog
Scripting Language System Implementation and
Design Conclusion
Introduction Applications of 3D virtual environments
and voice user interface have received significant attentions recently.
Incorporating VUI into virtual environments can enhance user interaction and immersiveness .
Most related research do not provide an effective mechanism for multi-user dialog management.
Contributions of this Research
1. Suggest a MUVE dialog model based on VoiceXML dialog model.
2. Propose a way to integrate speech interface into MUVE.
3. XAML-V : Extend XAML to provide a speech-enabled interactive animation scripting language.
4. Dealing with implementation problems of XAML-V using software patterns as recipes.
MUVE = Multi-user Virtual Environment
Agenda
Introduction Related Work Dialog Management in MUVE
(Multi-user Virtual Environment) The Design of XAML-V Dialog
Scripting Language System Implementation and
Design Conclusion
VUI / VE Integration Problems
[McGlashan 95] identified 3 types of Virtual Environment – VUI integration problems.• Speech Recognition• Language Understanding• Interaction Metaphor
Scott McGlashan is the editor-in-chief of W3C VoiceXML 2.0.
Integration Considerations Client Interface
• Ad hoc [Cernak02] • VRML – EAI - JSAPI[Wauchope03] [O.Apaydin02]
[Descamps01]
Dialog Management• Database : [Wauchope03] • IDE : [Cernak02] • Scripting Language :
- Based on VoiceXML: DialogXML [Nyberg02] and Galatea [Sagayama03]
- Customize: MPML-VR [Descamps01]
IMNet – A Client-Server MUVE System
IMNet Server
IMClientA
IMClientB
IMClientC
broadcast broadcastsend
Animation Script Language
Using high-level scripts to control animation characters is not a new idea.
AML focuses on synchronization of facial expression and voice.• lacks the function to extract or modify an
existing animation. STEP can compose new animations
from existing animation components.• falls short on specifying detail animation
attributes.
STEP = Scripting Technology for Embodied Persona
AML = Avatar Markup Language
XAML (eXtensible Animation Markup
Language) Describe character animations at
various command levels . Developers can compose a new
animation from existing animation clips.
The syntax is extensible by providing plug-in modules.
VoiceXML
VoiceXML 1.0 was proposed by W3C in 2000.
Used in telephony interactive applications.
Based on HTTP, using a form-based dialog model.
Server
Client
VoiceXML : An Example
<vxml version="2.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing?</prom
pt> </field> <block> <submit next="http://www.drink.example.com/drink2.asp"/> </block> </form></vxml>
Agenda
Introduction Related Work Dialog Management in MUVE
(Multi-user Virtual Environment) The Design of XAML-V Dialog
Scripting Language System Implementation and Design Conclusion
Definitions & Notations Dialog : Exactly two avatars concentrate
on interacting with each other. Subjects : Avatars in dialog. Observers : Avatars not in a dialog.
U : Avatars controlled by human. S : Avatars controlled by system. Suffix s : Subject avatars. Suffix i (i=1,2,3,…) :
Observer avatars.
Ss Us
Ui
Subjects
Observer
VoiceXML Dialog Model
VoiceXML was designed originally for dialogs in telephony systems.
In most cases there are 2 interactive instances in telephony applications.
Problems with VoiceXML Dialog Model
in MUVE (1)Ss
Us
Document Server
conceptuallyactually
How is the dialog status with Us ???
IMNet Server
Problems with VoiceXML Dialog Model
in MUVE (2)
Ss
Us2
Document Server
conceptuallyactually
Who is talking with me ???
Us1
actually
I’m talking with Ss.
VRML Browser
What should I draw ?
Problems with VoiceXML Dialog Model
in MUVE(3) Conceptually, Us is having Dialog with Ss.
Actually,Us interacting with Document Server which carries Ss’s Dialog script.
VoiceXML is lack of some dialog locking mechanism.
VoiceXML Dialog Model looks unreasonable in MUVE.
Proposed Dialog Model in MUVE
We enhance the originally VoiceXML dialog model to fix this problem.• Proxy Request• Dialog Lock• Dialog State• Dialog Negotiation
Benefits of Proxy Request Model
By applying this model in MUVE we have following benefits:• Us didn’t aware of Document Server
provides the flexibility to switch different roles.
• Ss did aware of dialog status with Us.
Dialogs without Dialog Lock
A
B
C
It’s impossible for A to accept speech input from multiple avatars at the same time.
A will confuse if B and C talk to him at the same time.
Speech Output from ASpeech Input to A
Dialog Lock
We suggest only 2 people can be in a dialog at the same time.
Dialog Lock mechanism is used to realize this constraint.
Dialog with Dialog Lock
A is currently in dialog with C
A
B
CDialog Lock
Dialog Scripts
Broadcasting Scripts
Broadcasting Scripts
Speech Output from A
Speech Input to A
Initialize a Dialog
Enter negotiation-stateSend dialog request message
Enter negotiation-state
Send dialog accept message
Enter in-dialog-state
Send dialog ack message
Enter in-dialog-state
Fetch first xaml-v script
Send first xaml-v script
Agenda
Introduction Related Work Dialog Management in MUVE
(Multi-user Virtual Environment) The Design of XAML-V Dialog
Scripting Language System Implementation and
Design Conclusion
The XAML Scripting Language
<AnimItem DEF =”WaveWalk” cycle=”2000”>
<AnimImport src=”Walk”>
<AnimItem DEF=”SimpleWave” cycle=”1000”>
<Node target=”r_shoulder”>
<OrientationInterpolator key =”…” keyValue=”…” />
</Node>
<AnimItem>
</AnimItem>
XAML-V Features Extension of XAML Scripting Language. Subset of VoiceXML . Supports form-level and field-level
animations. Realizing the concepts discussed in
previous section.• Dialog negotiation• Proxy request• Broadcasting
How XAML-V Realize the Proxy Request
Model
Ss
Us
Document Server
2. Issue HTTP Command:http://xxx/helloFormResponse.jsp?helpType=no %20thanks
1. Send proxy request message
4. Return requested dialog script
3. HTTP Response
Summary : Benefits of XAML-V
Extended form XAML animation script, XAML-V inherits its strong animation functions.
Can be dynamically generated by various Server-Side Script technologies.(i.e. JSP or ASP)
Dialog model works in MUVE.
Agenda
Introduction Related Work Dialog Management in MUVE
(Multi-user Virtual Environment) The Design of XAML-V Dialog
Scripting Language System Implementation and
Design Conclusion
XAML-V Implementation
XAML Platform delegates XAML-V scripting elements to VoicePluginObject.
Embedded animations are sent back to the Animation Manager.
Conclusion We believe that integrating speech interface
will make users to communicate in more natural way on MUVE.
In this thesis, we • Enhance the MUVE by integrating with speech
interface.• Suggest a new dialog model based on VoiceXML
dialog model to work properly in MUVE.• Design a XAML-V dialog script to realize
suggested dialog model.• Implement XAML-V platform using software
patterns.
Future Work
Face animation. Consider range between avatars. Consider 3D sound (sound
direction and volume). Add camera control into XAML-V. More sophisticated
synchronization between animation and speech interface.
Performance optimization.
Research Objective Provide a solution for VUI integration Dialog management mechanism in a multi-
user virtual environment (MUVE). Realizing such a mechanism.
Solving Synchronize Problems when Establishing
Dialog Dialog States Synchronization Mechanism Time-out