+ All Categories
Home > Documents > [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia...

[ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia...

Date post: 21-Dec-2016
Category:
Upload: leena
View: 213 times
Download: 0 times
Share this document with a friend
8
505 Gaze Tracking and Non-Touch Gesture Based Interaction Method for Mobile 3D Virtual Spaces Matti Pouke, Antti Karhu, Seamus Hickey, Leena Arhippainen Intel and Nokia Joint Innovation Center Center for Internet Excellence P.O. Box 1001, FI-90014 University of Oulu [email protected] ABSTRACT This paper introduces a continuous gaze tracking and non- touch gesture recognition based interaction method for 3D virtual spaces on tablet devices. The user can turn his/her viewpoint, select objects with gaze and grab and manipulate objects with non-touch hand gestures. The interaction method does not require the use of a mouse or a keyboard. We created a test scenario with an object manipulation task and measured the completion times of a combined gaze tracking and non-touch gesture interaction method, with a touch screen only input method. Short interviews were conducted with 13 test subjects and data was gathered through questionnaires. The touch screen method was generally faster than or as fast as the combined gaze and non-touch gesture method. The users thought, however, that gaze tracking was more interesting and showed potential. The gaze tracking would however require more stability to be suitable for use with mobile devices. Author Keywords 3D User Interfaces, Gaze Tracking, Non-Touch Gesture Interaction ACM Classification Keywords H.5.2 [User Interfaces]: Input Devices and Strategies gaze tracking, gestures. INTRODUCTION Mobile tablet devices have high powered graphic processing units (GPU) which support a growing number of high performance 3D spaces, primarily games/Computer Aided Design (CAD) tools. Interaction with 3D spaces require the user to handle viewpoint control (navigating the space), object manipulation (selection, movement and rotation) and application control (Hand 1997). Eye tracking, touch gestures and non-touch gestures are input techniques which can be used to help users interact with the 3D space. Tablets have some inbuilt multi-modal support, such as accelerometers and cameras. Francone and Nigay demonstrate the use of such sensors which utilise head tracking 1 to view 3D content (Francone, Nigay, 2011). A standard approach is to use a 'gaze selects, touch confirms' combination to deal with the 'midas touch' problem. In this approach, gaze is used to navigate a scene and highlight objects, while using second input means, such as mouse, or voice (van der Kamp, Sundstedt 2011) to confirm selections. While others have explored the use of gaze and non-touch gestures, they have largely involved large displays (Yoo et al. 2010) or desktop computers (Stellmach, Dachselt 2012; Stellmach et al. 2011) and focus more on 2D interactive spaces. There has been less work on the use of gaze and non-touch gestures for 3D spaces on tablet devices. In this paper, we present a 6-Degree of Freedom (6DOF) sensor - based non-touch gesture and eye tracking based control method for 3D spaces on a tablet device. Our system allows the user to interact in a 3D virtual space using only his/her eyes and one hand. The user can pick virtual objects and use a throwing gesture to send them to another network connected device. For this work, the authors utilise eye tracking software to track the users Point of Regard (POG) as they move about the 3D space. This allows the users to pan the scene or highlight objects. While the use of touch screen gestures is natural for many tasks, complex 3D interaction gestures such as rotation may be more easily performed by hand gestures, which is the most natural way we have for manipulating real 3D objects. However, most techniques for detecting hand position involve data gloves or using camera based sensors (e.g. Kinect) to detect hand motions. We did not think that camera sensors are a realistic option for tablet devices as the users hand is likely to be hovering just over the screen and would be too close for most camera sensors attached to the tablet. For this reason, we utilise a simple 6DOF sensor attached to the hand to detect basic gestures and hand rotations. This paper contributes to the knowledge of multi-modal interactions for tablet devices in a 3D environment, by combining eye tracking software and non-touch gesture system. We study the applicability of gaze tracking and 1 http://www.youtube.com/watch?v=bBQQEcfkHoE Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. OZCHI’12, November 2630, 2012, Melbourne, Victoria, Australia. Copyright 2012 ACM 978-1-4503-1438-1/12/11…$10.00.
Transcript
Page 1: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

505

Gaze Tracking and Non-Touch Gesture Based Interaction Method for Mobile 3D Virtual Spaces

Matti Pouke, Antti Karhu, Seamus Hickey, Leena ArhippainenIntel and Nokia Joint Innovation Center

Center for Internet Excellence P.O. Box 1001, FI-90014 University of Oulu

[email protected]

ABSTRACT This paper introduces a continuous gaze tracking and non-touch gesture recognition based interaction method for 3D virtual spaces on tablet devices. The user can turn his/her viewpoint, select objects with gaze and grab and manipulate objects with non-touch hand gestures. The interaction method does not require the use of a mouse or a keyboard. We created a test scenario with an object manipulation task and measured the completion times of a combined gaze tracking and non-touch gesture interaction method, with a touch screen only input method. Short interviews were conducted with 13 test subjects and data was gathered through questionnaires. The touch screen method was generally faster than or as fast as the combined gaze and non-touch gesture method. The users thought, however, that gaze tracking was more interesting and showed potential. The gaze tracking would however require more stability to be suitable for use with mobile devices.

Author Keywords 3D User Interfaces, Gaze Tracking, Non-Touch Gesture Interaction

ACM Classification Keywords H.5.2 [User Interfaces]: Input Devices and Strategies –gaze tracking, gestures.

INTRODUCTION Mobile tablet devices have high powered graphic processing units (GPU) which support a growing number of high performance 3D spaces, primarily games/Computer Aided Design (CAD) tools. Interaction with 3D spacesrequire the user to handle viewpoint control (navigating the space), object manipulation (selection, movement and rotation) and application control (Hand 1997). Eye tracking, touch gestures and non-touch gestures are input techniques which can be used to help users interact with the

3D space. Tablets have some inbuilt multi-modal support, such as accelerometers and cameras. Francone and Nigay demonstrate the use of such sensors which utilise head tracking1 to view 3D content (Francone, Nigay, 2011). A standard approach is to use a 'gaze selects, touch confirms' combination to deal with the 'midas touch' problem. In this approach, gaze is used to navigate a scene and highlight objects, while using second input means, such as mouse, or voice (van der Kamp, Sundstedt 2011) to confirm selections.

While others have explored the use of gaze and non-touch gestures, they have largely involved large displays (Yoo et al. 2010) or desktop computers (Stellmach, Dachselt 2012; Stellmach et al. 2011) and focus more on 2D interactive spaces. There has been less work on the use of gaze and non-touch gestures for 3D spaces on tablet devices. In this paper, we present a 6-Degree of Freedom (6DOF) sensor -based non-touch gesture and eye tracking based control method for 3D spaces on a tablet device. Our system allows the user to interact in a 3D virtual space using only his/her eyes and one hand. The user can pick virtual objects and use a throwing gesture to send them to another network connected device. For this work, the authors utilise eye tracking software to track the users Point of Regard (POG) as they move about the 3D space. This allows the users to pan the scene or highlight objects. While the use of touch screen gestures is natural for many tasks, complex 3D interaction gestures such as rotation may be more easily performed by hand gestures, which is the most natural way we have for manipulating real 3D objects. However, most techniques for detecting hand position involve data gloves or using camera based sensors (e.g. Kinect) to detect hand motions. We did not think that camera sensors are a realistic option for tablet devices as the users hand is likely to be hovering just over the screen and would be too close for most camera sensors attached to the tablet. For this reason, we utilise a simple 6DOF sensor attached to the hand to detect basic gestures and hand rotations.

This paper contributes to the knowledge of multi-modal interactions for tablet devices in a 3D environment, by combining eye tracking software and non-touch gesture system. We study the applicability of gaze tracking and

1 http://www.youtube.com/watch?v=bBQQEcfkHoE

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.OZCHI’12, November 26–30, 2012, Melbourne, Victoria, Australia.Copyright 2012 ACM 978-1-4503-1438-1/12/11…$10.00.

Page 2: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

506

non-touch gesture interaction methods in current mobile technologies. We analyse the difficulty of the methods in object manipulation with a user study. We compare the performances of the users operating with gaze and non-touch gesture based interaction method with those completing the same task with touch screen gestures.

RELATED RESEARCH Utilizing accelerometer and gyro sensors for activity recognition and 3D gestures is a very active research field. One of the most exhausting studies on the current state of hand gesture interaction is presented by LaViola and Keefe(LaViola, Keefe 2011). Non-conventional methods for moving and interacting in virtual spaces have been studied especially in conjunction with immersive virtual environments where the user is engulfed within the environment by virtual glasses or such (Bowman, Johnson & Hodges, 1999). Also, Dang et al. incorporate a spatial interaction device as one of their control methods (Dang et al. 2007).

Various machine learning techniques for accelerometer gesture recognition exist, such as Hidden Markov Models (Kela et al. 2006) or simpler but effective methods such as using path distances between 3D gesture trails as a decision function (Kratz, Rohs 2010; Kratz, Rohs 2011).

Eye-gaze tracking has been under research for decades because of its potential usages as an UI input device for various applications. Traditionally, the eye-gaze tracking has been used for point-and-select operations in 2D UIs (Hyrskykari, Majaranta & Räihä, 2005; Jacob, 1991).However, recent studies show interest for gaze based interaction in 3D virtual spaces. In (Castellina, Corno,2008), various input methods, including gaze, keyboard, and mouse, were compared in a 3D game. The results showed interest towards eye based control applied to 3D environment navigation and game controlling. The study in (Smith, Graham, 2006) showed that participants felt more immersed when using the eye-gaze tracker for input in 3D games. In (Istance, Vickers & Hyrskykari 2009), the eye-gaze tracking was compared to the standard keyboard and mouse as an input method for special use cases in Massively Multiplayer Online Role-Playing Game (MMORPG) World of Warcraft and virtual world Second Life. The result showed that the task completion times were very similar between gaze and traditional methods. Although the remote eye-gaze tracking has been successfully used in studies before, the experiments are usually done in controlled environments with accessories like a chin rest, which greatly reduces the users comfort level (Hansen, Ji 2010). Due to the individuality of eyes, susceptibility to head motion and light conditions, the remote eye-gaze tracking still remains as a challenging task (Hansen, Ji 2010). Because of these limitations, the technology still lacks usability requirements which are preventing its widespread use as an input device.

SYSTEM DEVELOPMENT The test system was built using eye tracking software developed for this project and a 6DOF-sensor attached to the dominant hand. For setting up the eye control, we installed four infrared (IR) led lights into the face of the device and an additional monochrome camera with an IR filter underneath (Figure 1). The main consequence of doing this was that touch screen gestures blocked some of the sensors causing the eye tracking software to fail. This effectively excluded the use of touch screen gestures until the system could be redesigned. A proprietary solution was chosen over commercial systems, such as Tobii mobile2, because we needed to integrate the gaze results in real time for a fully interactive experience. Our eye tracking solution differs from the EyePhone (Miluzzo, Wang & Campbell 2010) as we calculate a more accurate gaze vector representing actual screen coordinates, whereas the EyePhone divides the gaze position into 9 non overlapping regions.

In our setup, we aimed to achieve a semi-realistic setting by emulating a generic tablet device with gaze tracking and non-touch gesture recognition capabilities. We did not use a generic tablet device such as the Apple iPad or Samsung Galaxy Tab due to a lack of processing power, software support and our need for a more powerful camera. We used a touch screen–enabled HP EliteBook 2760p hybrid tablet/laptop. The device is equipped with a physical keyboard but the device can be folded in such way that the keyboard is hidden from the user. Once the camera and sensors were attached, the overall weight of the device was significant and had to be mounted on a stand to prevent excessive arm fatigue. The device was fixed into place using a stand as shown in figure 1. The non-touch gesture control was implemented with a Bluetooth connected WAA-010 6-DOF sensor which was attached into the back of the user’s hand. The device drivers of the HP EliteBook did not support multitouch so we mimicked the gaze and non-touch gesture actions with similar single-touch actions.

Figure 1. The operating device.

2http://www.tobii.com/eye-tracking-research/global/library/ videos/all/mobile-device-setups-with-tobii-mobile-device-stand/

Page 3: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

507

The user test scene was running on a realXtend Tundra virtual space application (Alatalo, 2011). The gaze tracking and non-touch gesture recognition were separate components communicating with the realXtend through a custom interface based on TCP and JSON. Also a network interface and a context server were used for object sharing.

Gaze Tracking Implementation The gaze tracking system was implemented using the Pupil-Corneal Reflection (PCR) technique, where four infrared led lights were placed around the screen of thetablet device to generate corneal reflections (glints) which are recorded by a front-facing camera, as shown in figure 1.

The user's gaze is estimated from these features by calculating feature vectors from the glints relative position to the pupil center. The feature vectors are mapped through the model parameters, the output is the user's point of regard, i.e., the coordinates where the user's gaze is located at the screen. The model parameters are determined through calibration. In the calibration process, the user’shead is in a fixed position and he/she gazes at 16 fixed points on the screen sequentially, as shown in figure 2.

Non-Touch Gesture Implementation The non-touch gesture control uses both direct measurement of accelerometer/gyro values and pattern recognition for interpreting the user's commands. It is quick to implement and thus suitable for rapid prototyping with small gesture vocabularies. The gestures consist of tilt,grab/switch, shake and throw. Tilting is used for object movement and object rotation. Grab/Switch is a fast downwards jerk used for selecting objects and switching between interaction modes (movement and rotation). Shake is performed by turning the hand quickly towards left and right in a doorknob opening fashion and is used for releasing the object. Throw is initialized by raising the user's arm upwards and then lowering it, which then triggers the Throw gesture. It is used for sharing the grabbed object to another device. This throw action is not covered in the user study, as it is discussed in a separate study. Due of simultaneous continuous gesture recognition and object manipulation the control gestures were made as short as possible. The Grab/Switch, Shake and Tilting gestures can be seen in Figure 3.

In addition to the control gestures, there was also a fourth “invisible” class, Still, which was classified when neither Grab, Shake or Throw was taking place. This invisible class allowed the gesture recognition to be continuous so the users did not have to trigger the gestures with a button press or such. The feature space for Still was trained to be large enough to be classified during the time the user was tilting the accelerometer and thus not accidentally triggering unwanted gestures. We validated the gesture recognition accuracy by measuring the results using similar classifier with 10-fold cross-validation in Weka Machine learning toolkit, reaching an overall accuracy of

Figure 2. Gaze Calibration.

Figure 3. Grab/Switch, Shake and Tilting gestures.

Figure 4. A user is rotating the object by tilting.

approximately 98%. However, the classification accuracy in itself does not necessarily correspond to the actual accuracy sensed by the user. For example, the Shake gesture meant for dropping an object was recognized with 100% accuracy but this does not mean the users still could not accidently trigger the gesture.

EXPERIMENT SETUP To test the object manipulation with gaze and non-touch gesture interaction we performed a small scale user test. First we conducted a pilot study with 2 subjects to iron out potential problems. Thirteen participants were involved in the test (9 male and 4 female).

The subjects were asked to move 6 dices from one part of the 3D world to a goal area and turn them so that the pips would face from 1 to 6 towards the user as shown in figure 5. This requires the user to navigate around the world, select objects, move them and then rotate them. Subjects were asked to do this using two techniques, the

Page 4: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

508

Figure 5. The start and the finish of the task.

gaze and non-touch gestures described in this paper, and a touch screen based approach. The touch screen based approach was used to act as a baseline to compare the task difficulty. To counter the effect of learning, half of the users performed the gaze and non-touch gesture trial first and half the touch gesture approach (see test procedure 4 and 6). The test procedure was the following:

1. Fill out a background questionnaire 2. Short introduction to control methods and task. 3. Gaze calibration (Figure 2) 4. Task I (gaze+non-touch gesture): Move 6 dices into a

goal area and turn them so that the pips would face from 1 to 6 towards you (the user) (Figure 4).

5. Fill out the questionnaire A. 6. Task II (touch): Move 6 dices into a goal area and turn

them so that the pips would face from 1 to 6 towards you (the user).

7. Fill out the questionnaire B. 8. Short interview afterwards. We did not ask the users to perform the tasks as fast as they could, but we nevertheless, compared the task completion times of both methods. We also encouraged the users to express their thoughts during the tasks and interviewed them both verbally and with questionnaires (5-point Likert scale). The control taxonomies for both methods as inspired

Figure 6. Gaze and gesture taxonomy.

Figure 7. Touch control taxonomy.

by (Bowman, Johnson & Hodges 1999) are described in figures 6 and 7. Each user got one chance of completing the task. There was a short introduction of both control methods in the beginning of the trial but the users did not get a separate practice round.

RESULTS In our questionnaires we had 25 questions in total. From the answers and qualitative data gathered in interviews, we specifically focused on the following factors: disorientation, comfort, ease of learning, naturalness, speed, accuracy, ease of use, sense of control and fun. We performed analysis of variance (ANOVA) for the quantitative data and compared the gaze, non-touch and touch gesture methods individually, or gaze and non-touch gesture as combination against the touch method where applicable. Some of the questions we had were more appropriate for comparing the two test runs against each other while others yielded more information when gaze, gesture and touch were treated as separate modalities. We also measured the completion times of the task using both interaction methods.

Page 5: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

509

Disorientation and Comfort The users reported none or very little disorientation. As the users had only the chance to turn horizontally, but not to reposition themselves in the virtual space, they rarely felt lost. Some users claimed they have tendency to suffer nausea or dizziness when using 3D virtual spaces, but they did not report any during or after the trial. Some users reported that they felt discomfort having to have their arm raised during the touch screen trial but it has to be noted that the situation was not entirely realistic because the device was fixed in one place (Figure 8).

Ease of Learning and Naturalness Majority of the users thought all of the control methods were fairly easy to learn with a few exceptions. According to questionnaires, the non-touch gesture control method was the easiest to learn, however the difference was not statistically significant. Typically the subjects usually thought all of the methods were reasonably easy to learn or all of them were equally difficult. Naturalness was a difficult question as the users had very different opinions on what is natural and what is not. Some of the users thought that eye and hand movements are natural in essence while some thought naturalness is linked to the ease of use. The answers varied heavily leaving the gaze as the most natural interaction method on the average but with no statistically significant difference to the other groups.

Accuracy and Speed of Selection In the gaze and non-touch gesture control method the object was highlighted by fixing the gaze upon an object and performing the Grab/Switch hand gesture to select the object for manipulation. In the touch interaction method, the finger was placed on top of the object and a long press of the finger selected the object. In the questionnaires, we asked about the accuracy and speed of the object selection individually for each, gaze, non-touch gesture and touch.

There was a significant difference between the control methods with f(2,24) = 17.45, p < 0.05 with the accuracy and f(2,24) = 7,85, p < 0.05 with the speed. The reason for this was the instability of the gaze tracking accuracy. As we did not constrain the users’ head movements, their gaze tracking accuracy varied a lot within and between experiments from satisfactory to intolerable. Some users reported bias in the gaze location on the screen, some had it right from the start and for others it developed during the experiment. The bias in the beginning of the experiment is due to the calibration not succeeding entirely. The users were not familiar with the calibration process and sometimes either did not move their gaze fast enough or were anticipating the calibration points and thus movingtheir gaze prematurely. A good number of samples are taken from one calibration point to prevent the bias, but if enough calibration points are failed the quality of the calibration suffers which results into bad gaze accuracy.The main reason for a sudden or slow decrease in accuracy was due to the user's head movement. The orientation and

the distance of the user's head from the camera can cause significant errors which is a common problem in remote gaze tracking systems. Some users reported that the accuracy actually increased towards the end of the

Figure 8. Moving dices into the target area using touch gestures.

Figure 9. Accuracy of selection in each control method according to individual subjects.

Figure 10. Speed of selection in each control method according to individual subjects.

experiment, which is probably due to the user learning how to focus one's gaze correctly. In the case of the two unsuccessful experiments, the first had an obvious reason for poor accuracy because the gaze tracking implementation was failing to locate the user's pupil.

The users who reported that the tracking accuracy remained constantly good usually performed almost as fast with both

Page 6: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

510

control methods. However, sometimes the users could overcome the bad accuracy and operate the dices by either adjusting their head position so that the accuracy increased towards the end or positioning the gaze pointer at the dices even when their actual gaze was elsewhere. For example, it can be seen from figures 9 and 14 that subject n. 7 reported a very bad gaze accuracy but still performed as fast with both control methods. On the contrary, there was more than 15 minutes of difference with one user who had very poor gaze tracking accuracy but completed the test nevertheless (after the experiment the user reported that she is suffering from squint which might have contributed to the poor gaze tracking performance). Of course, the figure 9 represents the users’ opinions, not the absolute difference of accuracy between trials.

The users thought the non-touch gesture control was fairly fast and accurate. The sensor reacted to the user’s input quickly and there were little false positives and false negatives. The opinions on speed and accuracy are fairly similar between the non-touch gesture and touch gesture interaction methods as can be seen in figures 9 and 10.However, some users thought the touch interaction was somewhat slower than non-touch gesture interaction.

Ease of Use and Sense of Control Even though the users thought the non-touch gesture control was fairly fast and accurate, they would have preferred some changes in the execution of the gestures. The most notable flaw was the difficulty of turning the sensor left with right-handed persons (and the opposite with left-handed) (Figure 11). As tilting the sensor towards each direction had the same exponential effect on object turning and movement it was significantly more difficult to move the objects to the left with the same speed as it was towards right. Some users also noticed that backwards movement was more difficult than the opposite. The users wished that the absolute motion required moving the objects would be smaller when moving leftwards or backwards. In this case, there should be a possibility to mirror the sensor scales for left-handed users. As the task required the users to move the objects mostly to their right, a single left-handed user had more difficulties completing the task than the others (Figure 12). Some users wished the motions required to move the objects should be smaller in general, i.e. they wanted to reduce the dead zone of the balanced sensor. Also, though the users did not report much difficulty moving the dices by turning the sensors, there were some who first tried to move the objects with a data glove style fashion. Sometimes the users also accidentally dropped the objects as they tilted the sensor quickly which was perceived as the ‘Shake’ gesture by the system.

Even with these flaws, however, the users thought that moving the objects with the non-touch gesture controls was significantly easier than with the touch screen interface f(1,16)=5.24, p<0.05. The average score for object turning was slightly easier with the touch interface but with no

statistical significance. In the turning tasks, the object orientation was not mapped directly to the pitch and roll of the sensor but some users would have expected this.

There was no significant difference in the sense of control between the interaction methods. The mean score for sense

Figure 11. A right-handed user is moving the dice into thetarget area using non-touch gestures.

Figure 12. A left-handed user is moving the dice into the target area using non-touch gestures.

Method/Factor Gaze + Non-touch gesture mean

Touch mean

Ease of Movement 4.1 2.8

Ease of Turning 3.4 4.8

Sense of Control 3.8 3.4

Table 1. Ease of Object Manipulation and Sense of Control.

of control is slightly bigger with the gaze and non-touch gesture controls but both methods are below 4 on average. The mean answers on the ease of object manipulation and sense of control can be seen in Table 1. The gaze and non-touch gestures are combined in the first column although the gaze itself contributes little on the mean scores in Ease of Movement and Ease of Turning.

FunBoth in the questionnaires and during the interviews we asked whether the participants thought the control methods were fun. In the questionnaires, there was no significant

Page 7: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

511

difference in the answers as can be seen in the questionnaire overview in Figure 13. During the interviews, however the subjects always mentioned that they found the gaze interaction to be especially interesting or fun. There was also one user who stated that the gaze control task itself was very stressful and no fun in any way but still thought the gaze and gesture interaction was way more interesting than more traditional interaction methods. The non-touch gesture interaction method as a separate

Figure 13. Mean answers of questionnaires on each control method.

Figure 14. Average dice operation time from each subject (repositioning and turning) in seconds.

component received least complaints and got good scores in the questionnaires. Still, the gaze input was always the centre of the users’ interest. We also asked if the users can think of applications where they would like to use the interaction methods. The majority of the users described gaze tracking based gaming applications.

Completion Times The arithmetic mean and median completion times for the gaze and non-touch gesture experiments were 9:47 and 7:59 minutes and 5:08 and 4:54 minutes for the touch screen control respectively. The standard deviation was 4:14 for the gaze and non-touch gesture method and 1:01 for the touch method. The large standard deviation in gaze and non-touch gesture control comes mainly from the inconsistency of the gaze tracking accuracy. To get some idea how much the sheer unfamiliarity of the interaction methods affects the performance, an author also performed the trials. The completion times were approximately 3

minutes when trying the both tasks for the first time. These times were not calculated in the results.

We also measured average time of moving and orienting a single dice after we removed the slowest and fastest dice of each trial. This brought the completion times of each control method slightly closer to each other as it was not rare that the user spent most of the time with a single problematic dice while the others went smoothly. On the average, the touch screen method still remained faster. The average dice operation times can be seen in Figure 14.

People with a lot of prior experience in 3D video games seemed to generally perform best but the single most important factor for the success in the gaze and non-touch gesture control task was the accuracy of the gaze tracking.

CONCLUSION This paper contributes to the knowledge of multi-modal interactions for tablet devices in a 3D environment, by combining an eye tracking software and non-touch gesture system. Even though the users mostly enjoyed using the gaze and non-touch gesture interaction method, the gaze recognition would require significant improvement in accuracy and reliability for the system to become generally usable. More specifically, the drift in gaze calibration is amajor issue that requires future research. The non-touch gesture control would also require some minor tweaks such as direction-specific sensitivity and pitch and roll –mapped turning. The touch screen was clearly the faster control method but it did not outperform in every aspect. Generally, the users felt that the gaze tracking was the most exciting thing about the UI even though it was the least accurate control method. However, some users stated that if they would have to perform ‘real world’ operations they would rather use the touch interface. All users felt that the gaze and non-touch gesture tracking would have potential if it would work well in every condition. For the gaze tracking to become suitable for mobile devices, significant effort should be put into the stability of the calibration algorithm. As of now, the user needs to remain very stationary which cripples the versatile ways of using mobile devices.

In the future research, the flaws perceived in this initial test should be corrected after which a quantitative test with more users should be made. We will also research the throw gesture described briefly in this paper with users. Future tests could also include multiple users performing simultaneously sharing objects to complete the tasks.

ACKNOWLEDGMENTS This work has been carried out in the TEKES Chiru project. We would like to thank our test users.

Page 8: [ACM Press the 24th Australian Computer-Human Interaction Conference - Melbourne, Australia (2012.11.26-2012.11.30)] Proceedings of the 24th Australian Computer-Human Interaction Conference

512

REFERENCES Alatalo, T. An Entity-Component Model for Extensible

Virtual Worlds, IEEE Internet Computing, , vol. 15, no. 5 (2011) 30-37.

Bowman, D.A., Johnson, D.B. and Hodges, L.F. Testbed evaluation of virtual environment interaction techniques, Proc. VRST (1999), 26-33.

Castellina, E. and Corno, F. Multimodal gaze interaction in 3D virtual environments", Proc. COGAIN 2008: Communication, Environment and Mobility Control by Gaze, (2008), 33-37.

Dang, N.T., Tavanti, M., Rankin, I. and Cooper, M. Acomparison of different input devices for a 3D environment. Proc. ECCE'07, ACM Press (2007), 153-160.

Francone, J. and Nigay, L. Using the user's point of view for interaction on mobile devices. Proc. IHM 2011. ACM Press (2011), New York, NY, USA.

Hand, C. A Survey of 3D Interaction Techniques", Computer Graphics Forum, vol. 16, no. 5 (1997), 269-281.

Hansen, D.W. and Ji, Q. In the Eye of the Beholder: A Survey of Models for Eyes and Gaze", IEEE Trans. Pattern Analysis and .Machine .Intelligence, vol. 32, no. 3 (2010), 478-500.

Hyrskykari, A., Majaranta, P. and Räihä, K. From Gaze Control to Attentive Interfaces. Proc. HCII 2005

Istance, H., Vickers, S. and Hyrskykari, A. Gaze-based interaction with massively multiplayer on-line games. Extended abstracts CHI 2009, ACM Press (2009), 4381-4386.

Jacob, R.J.K.. The use of eye movements in human-computer interaction techniques: What you look at is what you get. ACM Transactions on Information Systems, vol. 9, (1991), 152-169.

Kela, J., Korpipää, P., Mäntyjärvi, J., Kallio, S., Savino, G., Jozzo, L. and Marca, D. Accelerometer-based gesture control for a design environment. Personal Ubiquitous Comput., vol. 10, no. 5, (2006),. 285-299.

Kratz, S. and Rohs, M. Protractor3D: a closed-form solution to rotation-invariant 3D gestures. Proc. IUI 2011, ACM Press (2011), 371- 374.

Kratz, S. and Rohs, M. The $3 recognizer: simple 3D gesture recognition on mobile devices. Proc. IUI'10, ACM (2010), 419-420.

LaViola, J.J. and Keefe, D.F. 3D spatial interaction: applications for art, design, and science. ACM SIGGRAPH 2011 Courses, ACM Press (2011), 75 p.

Miluzzo, E., Wang, T. and Campbell, A.T. EyePhone: activating mobile phones with your eyes. Proc. MobiHeld'10, ACM Press (2010), 15-20.

Smith, J.D. and Graham, T.C.N. Use of eye movements for video game control. Proc. ACE'06, ACM Press (2006).

Stellmach, S. and Dachselt, R. Look & Touch: Gaze-supported Target Acquisition. Proc. CHI 2012, ACM Press (2012), 2981-2990.

Stellmach, S., Stober, S., Nûrnberger, A. and Dachselt, R. Designing gaze-supported multimodal interactions for the exploration of large image collections. Proc. NGCA'11, Press (2011).

van der Kamp, J. and Sundstedt, V. Gaze and voice controlled drawing. Proc. NGCA'11, ACM Press (2011).

Yoo, B., Han, J., Choi, C., Yi, K., Suh, S., Park, D. and Kim, C. 3D user interface combining gaze and hand gestures for large-scale display. Proc. Extended Abstracts CHI'10, ACM Press (2010), 3709-3714.


Recommended