Conversational Interface Agentsfacilitate natural interactions inVirtual Reality Environments.
Bielefeld University
Deictic expressions
pointinggestures regions or objects primary task
Virtual Reality manipulation ofobjects
selection of objectsray casting occlusion arm extension
interaction mediatedEmbodied Conversational Agent
understanding natural communication andnatural gesturesrobustness and accuracy interpretation naturalpointing gestures
study
pointing-based conversational interactions immersive VirtualReality
(such as “ ”) are fundamentalin human communication to refer to entities in the environment.In situated contexts, deictic expressions often comprise
directed at . One of thein applications is the visuallyperceivable . Thus VR research has focused ondeveloping metaphors that optimize the tradeoff between aswift and precise . Prominent examples are
, , or . These technologies arewell suited for interacting directly with the system.When the with the system is , e.g., by an
(ECA), the primary focus lieson a smooth of
. It is thus recommended to improve theof the of
, i.e., gestures without facilitation per visualaids or other auxiliaries.To attain these ends, we contribute results from a onpointing and draw conclusions for the implementation of
in.
put that there
Conversational Pointing Gestures for Virtual Reality InteractionImplications from an Empirical Study
Thies [email protected]
Ipke [email protected]
m
m
m
m
m
m
m
m
m
m
m
m
Measuring and Reconstructing Pointing inVisual Contexts
Deictic Object Reference in Task-orientedDialogue
Processing Instructions
Deixis: How to Determine DemonstratedObjects Using a Pointing Cone
Resolving Object References in MultimodalDialogues for Immersive VirtualEnvironments
Resolution of Multimodal ObjectReferences Using Conceptual Short TermMemory
3D UserInterfaces – Theory and Practice.
Mutual Disambiguation of 3D MultimodalInteraction in Augmented and VirtualReality.
A Gesture ProcessingFramework for Multimodal Interaction inVirtual Reality.
SenseShapes: Using Statistical Geometryfor Object Selection in a MultimodalAugmented Reality System.
AVirtual Interface Agent and its Agency.
Towards Preferences inVirtual Environment Interfaces.
Kranstedt, Lücking, Pfeiffer, Rieser &Staudacher
Kranstedt, Lücking, Pfeiffer, Rieser &Wachsmuth
Mouton de Gruyter, Berlin. 2006.
Weiß, Pfeiffer, Eikmeyer & Rickheit
Mouton de Gruyter, Berlin. 2006.
Kranstedt, Lücking, Pfeiffer, Rieser &Wachsmuth
.Springer-Verlag GmbH, Berlin Heidelberg.2006.
Pfeiffer & Latoschik
Pfeiffer, Voss & Latoschik
D. A. Bowman, E. Kruijff, J. Joseph J.LaViola, and I. Poupyrev.
Addison-Wesley, 2005.E. Kaiser, A. Olwal, D. McGee, H. Benko, A.Corradini, X. Li, P. Cohen, and S. Feiner.
In
, pages 12–19. ACM Press, 2003.M. E. Latoschik.
In
, AFRIGRAPH 2001, pages 95–100.ACM SIGGRAPH, 2001.A. Olwal, H. Benko, and S. Feiner.
In
, pages 300–301,Tokyo, Japan, October 7–10 2003.I. Wachsmuth, B. Lenzmann, T. Jörding, B.Jung, M. Latoschik, and M. Fröhlich.
In
, pages516–517, 1997.C. A. Wingrave, D. A. Bowman, and N.Ramakrishnan.
In
, pages 63–72,Aire-la-Ville, Switzerland, Switzerland, 2002.Eurographics Association.
Proceedings of the Brandial 2006
Situated Communication.
Situated Communication.
6th International Gesture Workshop
Proceedings of the IEEE Virtual Reality 2004
Proceedings of the EuroCogSci03.
Proceedings of the 5thInternational Conference on MultimodalInterfaces
Proceedings of the 1stInternational Conference on ComputerGraphics, Virtual Reality and Visualisationin Africa
Proceedingsof The Second IEEE and ACM InternationalSymposium on Mixed and AugmentedReality (ISMAR 2003)
Proceedings of the First InternationalConference on Autonomous Agents
EGVE’02: Proceedings of the Workshop onVirtual Environments 2002
Research BackgroundResearch Background
AcknowledgementsAcknowledgements
This work has been funded by theEC
Deutsche Forschungsgemeinschaft(DFG) in the CollaborativeResearch Center 360, “
”.
in the project ,FP6 - IST program - referencenumber 27654, and by the
PASION
SituatedArtificial Communicators
BibliographyBibliography
Secondary
Future Work
m
m
m
Taking the direction of gaze into account does not always improveperformance (contrary to the mainstream opinion).
Humans display a non-linear behavior at the borders of the domain
Confirm the results in a mixed setting with one human and oneembodied conversational agent over virtual objects
At least in thesetting used in our study, with widely spaced objects (20 cm), it canbe ignored when going for high overall success
A combined model for the extension ofproximal and distal pointing. The boundarybetween proximal and distal pointing isdefined by the personal distance .d
Motivation
Aim
How accurate are pointing gestures?
m
m
m
ImprovedAdvances for the and
Contributing to more
models for human pointinginterpretation
production of pointing gesturesrobust
multimodal conversational interfaces
Applicationsm
m
m
m
Human-Computer Interaction
Human-Agent Interaction
Assistive TechnologyEmpirical research Usability Studies
- multimodal interfaces
- Human-Robot Interaction
and- automatic multimodal annotation- automatic grounding of gestures based on a world model
- multimodal conversational interfaces
Observations
Modelling approach
m
m
m
m
m
(as expected)
Ellipse shapes of the bagplots suggest a cone-based model of theextension of pointing
Pointing is fuzzy , but even in short range distanceFuzziness increases with distanceOvershooting at the edge of the domain (intentional)Still, the human object identifier shows a good performance of83.9%correct identifications
AI GroupFaculty of TechnologyBielefeld UniversityGermany
PASION
Marc E. [email protected]
SituatedArtificialCommunicators
SFB 360
Method
How to determine the parameters of the pointing extension model?
What is the opening angle?
What defines the anchor of the model?
Results
Pragmatic Model
IFP is more precisethan GFP
distinguish between proximal and distalpointing
When the pointing extension is handled on thelevel of pragmatics, we can allow for inferencemechanisms to disambiguate between severalobjects and, hence, use heuristics. For thesimulation, we used a basic heuristics based onangular distance between objects and thepointing-ray. Again, the table on the right depictsthe results of our simulation runs. This time IFPperforms better than GFP.
. The opening angles in the proximalrows are rather large, while the angles in the moredistal angles are much smaller. This motivates usto
.
Primarym
m
m
m
Pointing is best interpreted at the level of pragmatics and not semanticsIndex-Finger-Pointing is more preciseGaze-Finger-Pointing is more accurateThe results stated in the tables to the left and our qualitativeobservations using IADE suggest a dichotomy of proximal vs. distalpointing. This fits nicely with the dichotomy common in manylanguages (here vs. there)
Conclusion
row IFP GFP
α perf. α perf.
1 84 70.27 86 68.92
2 80 61.84 68 75
3 71 71.43 69 81.82
4 60 53.95 38 65.79
5 36 43.84 24 57.53
6 24 31.15 25 42.62
7 14 23.26 17 23.26
8 10 7.14 10 14.29
all 71 38.54 61 48.12
row IFP GFP
α perf. α perf.
1 120 98.65 143 98.65
2 109 100 124 100
3 99 94.81 94 93.51
4 109 98.68 89 93.42
5 72 97.26 75 94.52
6 44 91.8 50 90.16
7 38 86.05 41 67.44
8 31 52.38 26 69.05
all 120 96.04 143 92.71
Simulations
Exploring the data
The optimal opening angles per row for aof the pointing extension. In
addition, the performance in terms of correctly identifiedobjects in percent of all objects within the specified area isdepicted, both for IFP and GFP. The row titled all showsthe performance for rows 1-7, row 8 has been excludedbecause of the overshooting behavior.
a strict semanticpointing cone model
The optimal opening angles per row for aof the pointing extension. In
addition, the performance in terms of correctly identifiedobjects in percent of all objects within the specified area isdepicted, both for IFP and GFP. The row titled all showsthe performance for rows 1-7, row 8 has been excludedbecause of the overshooting behavior.
a pragmaticpointing cone model
A visualization of the intersections of the pointing-ray (dots) for four different objects over allparticipants. The data is grouped via bagplots, the asterisk marks the mean, the darker areaclusters 50 percent, the brighter area 75 percent of the demonstrations. In the depicted settingthe person pointing was standing to the left, the person identifying the objects to the right.
(M1)
(M2)
(M3)
Study on object pointingm
m
m
m
m
Interaction of two participantsTwo conditions: speech +gesture and gesture onlyReal objectsStudy with 62 participantsCooperative effort withlinguists
proximalcone
distalcone
distance d
Technologym
m
m
m
Audio + video recordingsMotion capturing using ARTGmbH optical tracking systemAutomatic adaptation of amodel of the user’s postureSpecial hand-made soft gloves
Interaction Gamem
m
m
m
m
m
Description Giver is presented withthe object to demonstrateDescription Giver utters deicticexpression (s+g or g only)Object Identifier tries to identify theobjectDescription Giver gives feedback(yes/no)Proceed with next objectNo corrections or repairs!
The study combines multimodaldata comprising audio, video,motion capture and annotationdata. A coherent synchronized viewof all data sources is provided usingthe Interactive Augmented DataExplorer (IADE), developed atBielefeld University.The picture to the right shows asession with the Interactive Aug-mented Data Explorer. The scientistto the right interactively explores therecorded data for qualitativeanalysis. The model to the left showsthe table with the objects and a stick-figure driven by the motion capturedata. The video taken from onecamera perspective is displayed onthe floating panel, together with theaudio recordings. Information from
specific annotation tiers is presented as floating text.The scientist can, e.g., take the perspective of thedescription giver or the object identifier. Allelements are interactive, e.g. the video panels andannotations can be resized or positioned to allow fora comfortable investigation.
Several simulation runs have been conducted totest different approaches to model the pointingextension on the collected data.
For a strict semantic model the pointing extensionhas to single out object. In thesimulation runs we determined the optimal anglefor such a pointing cone per row and for theoverall area. The results are depicted in the tableon the right. For the strict semantic model theGFP offers better performance while having anarrower opening angle.
Strict Semantic Model
one and only one
GFP is more accuratethan IFP.
Do we aim with the index finger (IFP) orby gazing over the index finger (GFP)?
spotlight
camera
display- M1: task- M2 + M3: system time
tracking system