Kinect to Architecture

KINECT TO ARCHITECTURE

Leroy Savio D’Souza

Department of Electrical and Computer EngineeringUniversity of Auckland, Auckland, New Zealand

AbstractNavigating virtual environments such as 3D architecturalmodels, using a mouse or a keyboard, can prove to bevery unintuitive and unnatural. To overcome this issue,researchers have developed systems that incorporate theuse of hand gestures for human computer interaction.This report details the development of a similar gesture-based interface, using the Microsoft Kinect for full body3D motion capture. The interface allows users to import3D architectural models, and explore them using handgestures and body poses. The interface supports standard3D model file formats and consists of a start screen, amodel selection screen and a model navigation screen.An on-screen avatar also provides visual feedback to theuser. A user study was performed to test the naturalnessof the application. The study indicated that the systemdetected gestures with high levels of accuracy, and thatmost subjects found the system very easy to use. The in-terface can be further improved with the implementationof additional gestures.

1. IntroductionThree-dimensional models play an important role inarchitectural design, since they allow architects andcustomers to review, communicate and present designproposals. They also give users a good impression ofspace and geometry. However, navigation of the modelusing traditional interface devices such as keyboards andmice, inherently limits the naturalness and the speed ofinteraction with the model.

The objective of this project is to design a NaturalUser Interface that would allow users to navigate through3D architectural models. This would provide a moreinteractive environment to conceptualise 3D space. Ex-isting systems such as, immersive virtual environmentshave shown great potential in this area. However, thesesystems often require users to wear additional devices,thereby compromising the naturalness of the system.The focus of this project is to incorporate the use of handgestures and body poses to allow users to interact with3D models.

In recent years many systems have implemented

the movement of the human arm or hand gestures,as a means of interacting with computers. Howeverseveral of these systems, such as glove-based devices,compromise convenience by requiring the user to beinstrumented with bulky devices [1, 2]. They can alsoprove to be too expensive for the consumer market [3].Alternatively vision-based recognition of hand gesturespromises natural and non-obstructive interaction, but canbe challenging due to the complexity of the human handstructure and motion.

The recent release of Microsoft’s Kinect controller[4] has generated significant interest, with its abilityto detect human proximity and motion. Consequently,it was decided that the Kinect would prove to be auseful tool for gesture detection. The final program isa gesture-based interface can be used by architects toimport 3D architectural models, allowing their clients tonavigate through them using the Kinect. The programalso gives visual feedback to users with the use of anon-screen avatar.

The report is structured as follows. In section 2,previous research and works completed in this field isdiscussed, and their conclusions noted. In section 3 therequirements that must be met by the final design are dis-cussed. Section 4 details the overall design of the system,including the application architecture, the toolkits usedand the gestures that have been implemented. Section 4describes the user study that was conducted, followedby an outline of results and discussion in section 5. Thelimitation of the system and further work is describedin section 6 and 7 respectively. Finally, conclusionsand acknowledgements are disclosed in section 8 and 9respectively.

2. Related Work2.1. Gesture Recognition

3D gesture recognition in virtual environments has beenan important research topic over the years. Researchershave proposed many different methods for resolving thisissue [5]. However, most systems use either motion orvideo sensor-based recognition.

Motion sensor-based gesture recognition utilisesdata obtained from devices such as, acceleration mea-surements from accelerometers and angular velocityfrom gyroscope sensors [1, 6]. One of these designsis demonstrated in Figure 1. This approach has theadvantage of being unaffected by environmental factorssuch as lighting, but can prove to be cumbersome for theuser.

Figure 1: Data glove designed for gesture recognition

Video sensor-based gesture recognition uses theimages from one or more video sensors [7, 8]. It thenuses complex image processing algorithms to obtaingesture information. This method is relatively inex-pensive, but is mainly used for indoor activities due toits dependency on the level of lighting. The MicrosoftKinect (shown in Figure 3) uses the video sensor-basedapproach. The Kinect has an advantage over traditionalvideo sensor-based systems because it uses infrared lightto measure proximity, making it immune to ambient light[9]. The depth data collected by the Kinect has beenused to control quadrocopters [10] and perform fingertracking [11]. The Kinect also has a microphone arrayand a RGB camera, however the focus of the project willbe on its depth sensors.

2.2. Natural User Interface

The term ”natural user interface” or NUI describesan interface that focuses on human abilities such astouch, vision, voice, motion and other higher cognitivefunctions [12]. The main goal of a NUI is to ensure thatthe system conforms to the users’ needs rather than theuser having to conform to the system. This essentiallymeans that a successful natural user interface can bedescribed as one where the user takes very little time totransition from a novice to an expert. This is a significantstep in the area of HCI because the use of NUIs will

avoid current restrictions present in complex interactionwith digital objects, such as the one being faced in thefield of architecture. An example of such an interface isdicussed below.

2.3. Navigation of Virtual Environments

Virtual environments have proved efficient for training inthree-dimensional spaces. It is therefore not surprising tofind architects using virtual environments to intuitivelyconceptualise 3D space. Traditionally, virtual environ-ments have required users to familiarise themselves withthe complex techniques and equipment required to fullyexperience the 3D space. These systems can at timesprove to be counter-intuitive, as it can take a long timefor the user to get accustomed to the system.

Arch-Explore was a similar project designed to pro-vide a natural interface to navigate 3D models using im-mersive environments [13]. The system allowed users towalk through a given miniature architectural model withthe use of a Head Mounted Display (HMD) or a compleximmersive environment. The downside to this techniquewas the fact that users had to wear additional equipment.This gives the Kinect a more natural feel as the user is notweighed down with bulky devices.

3. RequirementsThe aim of this research project is to develop an interfacewhich can be used by architects and their clients tonavigate a 3D model of a house or a building. Theinterface will be programmed using C++ and makeuse of OpenGL, Autodesk FBX Software DevelopmentKit (SDK), OpenNI, NITE Middleware and Ogre3D.The hardware components for the system consist of theMicrosoft Kinect and a computer. Additional equipmentsuch as a projector or a screen can be utilised if available.

Ideally, the system should emulate the feeling ofwalking through the model. This means that users shouldbe able to perform gestures that feel natural to them.An important aspect of the project was to retain thisidea of naturalness. Since this term is subjective andunique to each user, research was conducted to find outwhat factors determined naturalness. It was found tocomprise of four major components, accuracy; ease ofuse; how memorable the gestures were and if the actionswere fatiguing. However, the time given to completethe project limits the proper investigation required tofind and develop a palette of gestures that every userwould define as natural. Also the toolkit that is beingutilised for gesture detection, is unable to perform fingertracking, hence the gestures that have been implementedare restricted to hand gestures and body poses. The

most basic feature that the program must allow usersto perform is forward and backward motion for modelnavigation. To fully experience the architectural spaceusers also need a way to turn or pan the view. Since mostarchitectural models would be multi-storey structures,the program must also implement a method of being ableto move up and down through different storeys.

For successful navigation of a 3D environment,some form of visual feedback must be provided backto the user, to ensure they know how their actions arebeing translated into the virtual world. This can be donewith the use of an avatar. It would also give users anopportunity to test the responsiveness of the system, asthe avatar would replicate all the actions that the user isperforming, regardless of whether the given gesture hasbeen implemented in the system.

The program should also provide the user with anoption to choose the model that they are navigating andgive them the option to change the viewing model with-out having to restart the program. Ideally this changeovershould happen with the use of gestures. Using the ges-tures simply for navigation but not for the remainder ofthe program risks endangering the intuitiveness of the in-terface. Hence the interface as a whole should make useof gestures and body poses, rather than requiring inputdevices such as a mouse or a keyboard. The support ofstandard 3D model file formats is key to the long-termsuccess of the program. This would ensure that archi-tects are not inconvenienced by having to convert theirdesigns to a special format solely for the use of this plat-form, thereby making it both versatile and scalable.

4. Design4.1. System Overview

The Kinect sensor is used as an input device to capturethe full body 3D motion that the user is performing. Thedata is then transmitted to the computer and processedby NITE framework to perform gesture detection andskeletal tracking. The OpenNI library acts as a commu-nication layer between the Kinect, NITE Framework andthe main program. The program consists of two mainmodules, a Gesture Manager and a User Interface (referto Figure 2). The Gesture Manager accesses skeletal datavia OpenNI APIs to check if any of the gestures the userhas performed corresponds to ones that are used in theprogram. It then flags these events to be processed by theUser Interface.

The User Interface consists of three main parts, a startscreen, model selection screen and a model navigationview. The entire user interface incorporates the use ofhand gestures and body poses. Upon the start of the pro-

Figure 2: System Architecture

gram, the user is taken to the start screen where he/shecan select different options. Selecting the ’Start’ optiontakes them to the model selection screen, where users canbrowse through available models. Once they have founda model they want to view, they simply select it, lead-ing them into the model navigation view. Here the usercan use various hand gestures and body poses to navigatethrough the given model. The 3D models are renderedonto the screen using the Ogre3D library. An avatar isalso located on the screen providing visual feedback tothe user about the actions they are performing. They canthen perform an exit gesture to return to the model selec-tion screen allowing them to select a different model toexplore.

4.2. Gestures

To achieve a natural user interface hand gestures andbody poses were incorporated into the program to allowusers to experience the architectural space in a 3D model.Gestures were also used to interface with the other as-pects of the program such as the start screen and themodel selection. As previously mentioned, the Kinectsensor is used to collect depth data in order to performgesture detection.

4.2.1. Gesture Detection

The Microsoft Kinect, shown in Figure 3, has been cho-sen as the video sensor for the system. It was released byMicrosoft for the Xbox 360 video game platform in 2010.The Kinect is used to receive hand motion and gesture in-put from the user..

Figure 3: Microsoft Kinect Controller

The Kinect sensor consists of a RGB camera, depthsensor and multi-array microphone, which providecapabilities such as, full-body 3D motion capture, facialrecognition and voice recognition. The depth sensorconsists of an infrared laser projector combined with amonochrome complementary metal-oxide semiconduc-tor (CMOS) sensor, which captures video data in 3D.The sensing range of the depth sensor can be adjusted.

The depth sensor first illuminates the scene in an in-frared depth pattern, which is invisible to the human eye.The CMOS image sensor reads back the depth pattern,which is now distorted due to the various objects in thescene. This distortion is processed by a System on Chip(SoC) connected to the CMOS image sensor, producinga depth image of the scene. The data is then transmittedto the computer via Universal Serial Bus (USB) interface.The highlight of this method is that its unaffected by am-bient light and works in any indoor environment [9].

4.2.2. Gesture Recognition

The raw data from the Kinect is obtained via the Sen-sorKinect driver. This raw data is processed to extract theskeletal and gesture data of the user. The OpenNI libraryis used to interface between SensorKinect and the NITEframework, which is used to perform skeletal trackingand gesture detection. The Gesture Manager accessesthis skeletal data via OpenNI APIs. OpenNI was chosenas the desired toolkit for this purpose because it providesan abstraction layer allowing rapid and easy developmentof a natural user interface, which made testing mucheasier during the early stages of development.

For skeletal tracking, NITE framework requires theuser to first perform a calibration pose, as shown in Figure4. The user’s body proportions are then mapped onto thisskeletal model, which consists of 15 different points de-noted by the circles in Figure 4, to achieve skeletal track-ing. NITE uses skeletal tracking to help facilitate gesturedetection. For example, to detect a push gesture the mo-

tion of the hand point in relation to the torso point wouldbe tracked. When the relative distance between the twopoints has reached a given limit, a successful push gestureevent can be triggered. However NITE is unable to per-form finger detection which inherently limits the amountof hand gestures that can be used in the program.

Figure 4: The calibration pose

4.2.3. Gesture Classification

When the user performs a given gesture, it is processed tosee if it is one of the gestures implemented in the system.If it is then the gesture is added to a First-In-First-Out(FIFO) buffer. This enables the User Interface to accu-rately respond to a given sequence of gestures. In total 9gestures were implemented in the final design, consistingof 7 hand gestures and 2 body poses. To detect each ofthe gestures, both the orientations and positions of theuser’s hands and feet relative to their torso is tracked.Each gesture requires the user’s hand or feet to be in acertain position to be detected. Part of the developmentprocess included testing the system at various stages toinformally ask participants on what gestures they feltwere natural. This ensured that the final design hadgestures that most users would find natural.

The hand gestures consist of pointing the right armin various orientations namely up, down, left, right anddirectly forward. They also include a cross gesture(see Figure 5) which involves crossing both arms infront of the torso in an ’X’ shape and a push gestureusing the right arm. The pointing forward gesture isdetected by measuring the relative distance betweenthe right hand point and the torso in the Z direction i.e.directly in front of the user. This is then compared toa minimum value to trigger a pointing forward gesture(see Figure 6). Similarly the pointing left/right andup/down gestures are detected by measuring the locationof the hand point with respect to the torso point in the Zand X (sideways) directions, and Z and Y (up or down)directions respectively. An additional constraint wasplaced on the down gesture that the user’s hand must

be placed some distace below and away from the torso.This was done to ensure that the pointing down gesturewas not accidentally triggered when the user was in therest position (with his/her hands to their side). The crossgesture is detected by checking the positions of the hand,elbows and their angular orientation. Finally the pushgesture detection is provided by the NITE frameworkand is detected by sensing a continuous push motiontowards the Kinect.

Figure 5: Cross gesture

Figure 6: Point forward gesture

The body poses consist of either stepping forward(see Figure 7) by placing the right foot ahead of the leftfoot or by stepping backward by placing the right footbehind the left foot. The relative distance between theleft and the right feet in the Z direction is then measuredand if this is greater than a minimum value, the gestureis triggered.

Initially, the minimum detection values were set tofixed values. However upon preliminary testing it wasfound that due to variation in users’ heights, the sys-tem had trouble detecting some of the gestures. It was

Figure 7: Step Forward Pose

then decided that to improve the system’s capabilities,the minimum values needed to be dynamically generated.Hence, a preliminary study of 6 users was conducted. Thestudy involved measuring the participants’ heights usingthe Kinect by calculating the difference between the headpoint and the feet points. They were then asked to per-form all the different gestures implemented in the systemto the extent which they felt comfortable, for example,pointing left till a point where he/she did not feel any ad-ditional strain or discomfort. The ratios of the relativeX and Y distances of their hands/feet from their torso totheir height was measured for each gesture. These ratioswere then averaged to find an overall ratio for each ges-ture. The gesture detection incorporated these ratios andthe height of the current user to dynamically tailor thehand and feet positions for each user.

4.3. 3D Model Rendering

Architects use 3D models to represent the three-dimensional geometric data of a building or structure.These models would be imported into the proposed sys-tem, so that users may navigate through them. Howeverbefore they can be used by the system they must beparsed and rendered.

During the parsing process, information about themodel such as, the location of each vertex in the model;the direction of the normal for each vertex (used forlighting); and the mapping of the textures onto the modelis obtained. The toolkit used to perform this task wasAutodesk FBX SDK. This is a C++ SDK and was chosenbecause it supports several standard 3D file formats suchas FBX, OBJ and 3DS. This helps achieve one of themajor requirements of being able to support standard 3Dmodel file formats.

Once this data has been obtained, the model is thenready to be rendered. During the rendering process,the 3D model is converted into 2D images while re-

taining 3D photorealistic effects. Initially OpenGL andAutodesk SDK were used in rendering and setting upthe scene. However, it became increasingly difficult toexpand the program in terms of features as they bothrequired programming low-level features. Hence, theywere replaced by Ogre3D which provided a much moreabstracted framework, allowing faster development andincreased flexibility.

4.4. Avatar

One of the objectives of the project is to provide somemanner of visual feedback to the user. This was achievedby the implementation of an Avatar. The avatar mirrorsthe user’s body movements to let the user know howtheir actions are being translated into the virtual domain.It was also very useful in the testing of the systemas it provided information on how certain poses weredifficult to detect using the Kinect. For example, turningsideways caused the avatar to get confused and behaveunexpectedly.

It was implemented by mapping different points ofthe humanoid model to the 15 different skeletal pointsprovided by NITE. This allows the avatar to track themovement of the user. Initially the avatar was imple-mented using a third-person camera view at the centreof the screen. However, after preliminary testing it wasfound that users focussed more on the avatar rather thanthe model itself. Hence, the camera view was changedto first-person view and the avatar was made smaller andmoved to the bottom right corner of the screen.

4.5. System Design

4.5.1. Start Screen

The start screen (shown in Figure 8) was designed usingthe Ogre3D library. It consists of four different options,’Start’, ’Help’, ’About’ and ’Exit’. Once the user has per-formed the calibration pose successfully, they can high-light the different options by simply performing the pointup or down gestures. They can then select the optionsby performing the push gesture. Selecting the ’Start’ op-tion allows the user to proceed to the model navigationscreen. The ’Help’ option is currently not implemented,but it was planned that selecting it would either providea list of the gestures available in the system or a tutorialvideo. The ’About’ option provides a very brief informa-tion about the context in which this project was designedand the ’Exit’ option closes the program.

4.5.2. Model Selection Screen

The model selection screen consists of previews of all thedifferent models made available to the user to explore andnavigate through. The previews can either be automati-

Figure 8: Start Screen

cally generated or users can use a pre-existing picture.The model selection screen was inspired by the Cover-flow browsing designed by Apple Inc. [14]. Users canbrowse through the previews by performing the point leftand point right gestures. To select a given model, the usersimply needs to perform a push gesture and to return tothe start screen a cross gesture can be performed. Ini-tially, a swipe gesture consisting of a sideways motionof the right hand was used to browse through the mod-els. However, this proved to be fairly unreliable as swipeswere hard to detect.

Figure 9: Model Selection Screen

4.5.3. Model Navigation Screen

The model navigation screen is the main part of the pro-gram it consists of a view of the model and an on-screenavatar at the bottom right corner (shown in Figure 10).Several gestures are used to navigate and explore the3D model. Throughout the development process variousgestures were informally trialled by users. This ensured

that the gestures chosen for the final design would feelnatural for most users. A list of the available gesturesand their functionalities are shown in Table 1.

To move forward in the model, two gestures werefound to be equally popular, the pointing forward gestureand the step forward pose. Users commented that point-ing forward to move directly ahead felt very intuitiveand using the step forward pose felt as though they werestepping into the model itself. The obvious choice tomove back in the model was the step back pose. A keyrequirement for navigation was that users should be ableto look around the model or pan the camera view. Thiswas implemented via the use of the pointing left or rightgestures to move the camera left or right respectively.

An additional requirement that had to be met wasthe fact that since not all architectural models weresingle storey structures, the program had to provide themeans to move through different levels. Initially, it wasproposed that users could choose a level to navigate atthe start of the model navigation and could then switchbetween them. However, due to time constraints a simplegesture was implemented to move up and down a model.This is accomplished when the user performs the point-ing up or pointing down gestures. During preliminarytesting it was found that users would often get lost inthe model if they were unfamiliar with it. It was thendecided that a return to original position gesture wouldbe implemented allowing the user to return to the startingpoint. This functionality is achieved by performing thecross gesture. If the cross gesture is performed and heldfor 2 seconds the user exits model navigation and istaken to the model selection screen.

To improve the ease of use of the program, it wasdecided that each of the gestures described above, apartfrom the cross gesture, if performed and held in the sameposition would continue performing that functionality.For example, if the pointing forward gesture was per-formed, the user would continue moving forward throughthe model until he/she stopped performing that gesture.This would ensure that the user was not fatigued by hav-ing to repeatedly perform the same gesture to achieve thesame functionality over a period of time. Incorporatingboth body poses and hand gestures allows users to per-form multiple gestures at the same time. For example, auser can move both forward using the step forward poseand pan the camera by pointing left or right at the sametime. This provides the illusion of walking through thebuilding and turning your head to view your surround-ings.

Figure 10: Model Navigation View

Gesture FunctionalityPoint forward Move forwardStep forward Move forward

Step backward Move backwardPoint up Move to an upper storey

Point down Move to a lower storeyPoint left/right Pan Camera

Cross Return to starting positionCross (hold for 2 seconds) Return to model selection

Table 1: Available gestures and their functionalities

5. User Study

A user study was conducted to test how intuitive andnatural it was for users to navigate 3D architecturalmodels using the designed program. The user studyconsisted of 7 participants, with all participants beingstudents from the University of Auckland. Six partici-pants had used gesture-based interfaces previously in thecontext of gaming. The study involved first allowing theparticipants to become familiar with the interface andtry out all the different gestures available. They werethen instructed to complete a timed test. To prevent bias,subjects that took part in any preliminary testing weredeemed to be unsuitable for this user study.

The test setup included a MacBook Pro and a Kinect.The testing application required users to collect 8 hoopslocated at various points in the model. The time takento collect all the hoops was recorded. The averageresponse time i.e. the difference between the time usersaw a given hoop to the time the user collected the hoopwas also recorded. To ensure fairness the location ofthe hoops was randomised for each test, however therelative distance between each hoop was kept the same.The test was then repeated using a keyboard, with the

functionality of the system being mapped onto keyboardkeys (shown in Table 2). The times using both the Kinectand keyboard was analysed.

Key FunctionalityW Move forwardS Move backwardI Move to an upper storeyK Move to a lower storeyA Pan Camera LeftD Pan Camera Left

Table 2: Keyboard keys and their functionalities

The users were then asked to complete a question-naire about the systems which consisted of two main sec-tions. The first section evaluated each of the gesturesin 5 aspects namely, accuracy, ease of use, level of fa-tigue experienced, how memorable the gestures were andhow responsive the system was to a given gesture. Thiswas to determine how natural and intuitive users foundthe system. The second section required them to com-plete an evaluation with regards to the system as a whole.Both these sections were quantitative making use of a 5-point Likert Scale, with 5 representing either Excellentor Strongly Agree and 1 either Poor or Strongly Dis-agree. The participants were given the opportunity toprovide qualitative feedback about the feature(s) or ges-ture(s) they preferred the most and the one they foundmost difficult to use. The results of these questionnairesare detailed in the next section.

6. Results and DiscussionEach individual gesture and the system overall werequantitatively analysed. For a given gesture, all thescores for a given factor, for example accuracy, wereadded together and the sum was divided by the maxi-mum total score possible. This was used to calculate apercentage. The results are as follows.

From Figure 11, 12 you can see that the push gesturewas rated lowest in terms of accuracy, ease of use,memorability and responsiveness. During the course ofthe project, it was found that in-built gestures such asSwipe or Push which were provided by NITE, could notbe successfully integrated into the program. A possiblesolution is to design a method to detect these gesturesmanually. In terms of ease of use the cross gesturehad the highest rating and the up gesture was the mostaccurate. Based on these results, the step forward gesturewas added to the other parts of the interface and can beused in place of the push gesture. This would increasethe reliability of the system.

Figure 11: Graph displaying the Accuracy and Ease ofuse of each gesture

It is interesting to note that in Figure 12, pointingleft/right to browse through different models was morememorable than using the same gesture to pan thecamera. This indicates that the naturalness of the gesture,is not simply dictated by the action itself, but rather thecontext that it is used in. The average accuracy acrossall the gestures was 79.68%, the average ease of use was81.26%, the average responsiveness was 86.67% and theaverage memorability was 89.52%.

The users also rated the gestures on how fatiguingeach of the gestures were. The higher the percentage,the more fatiguing the gesture. As you can see in Figure13, the push and cross gestures caused the least amountof fatigue and the pointing left/right was the most tiring.Most of the users stated that after 8-10 minutes ofcontinual usage they were slightly fatigued. This is notsurprising as currently navigation involves continualmovement of the right arm.

Comment Average RatingEasy to manoeuvre through models 4.0System detects gestures accurately 4.1

System is responsive 4.1Program has a low learning curve 4.0

System is reliable 3.9Prefer this interface over a mouse/keyboard 2.6

Table 3: System evaluation

The time taken to collect the hoops is shown in Table4. The average reponse time with the keyboard wasfound to be 1.71 seconds and on the Kinect was 4.14seconds. Both the times to collect the hoops and theaverage reponse times are higher on the Kinect than the

Figure 12: Graph displaying the Responsiveness andMemorability of each gesture

Figure 13: Graph displaying the Level of Fatigue of eachgesture

keyboard. This is to be expected since all the users hadprior experience using keyboard interfaces. However, itshould be noted that the main aim of the project was notto demonstrate that a Kinect based interface was faster,but rather to design a more natural interface for users.

In terms of qualitative feedback, most of the usersfelt the interface could be improved by incorporating theuse of the left hand in gestures. Although the programcurrently doesn’t support this, it can easily be done andwas not implemented earlier due to it being deemed a lowpriority feature. The use of both arms simultaneouslyfor navigation was not implemented, as an assumptionwas made that requiring both hands to control the systemwould greatly fatigue the user. Users also found itdifficult to have precise control over movements andthought it would be useful to vary the movement speedbased on the position of their arms.

Overall the results have been very positive, indicatingthat a gesture-based navigation interface for 3D architec-

tural models using the Kinect is both feasible and wouldprove to be a very useful tool.

Time Taken Kinect KeyboardMinimum 57s 40sAverage 81s 60s

Maximum 125s 115s

Table 4: Time taken to complete test

7. LimitationsSome limitations with the NITE framework include therequirement of performing the calibration pose at the startof the program and the lack of support for finger track-ing. This forced us to implement gestures that requiredfully body or limb movements. These movements, asindicated in the user study, fatigued the user after 8-10minutes of continual usage. The Kinect also had diffi-culty detecting people with jackets or other forms of looseclothing. It was also discovered that rendering and load-ing more complicated models took proportionally longertime (maximum time measured was 15 seconds). Render-ing a 3D model is a very resource intensive process andcan only be improved with the use of a better computer.

8. Further WorkIn future, addition of features such as finger trackingwould provide users with a much easier and less fatigu-ing method of navigation. An investigation can be carriedout to see if gestures involving both hands would proveto be natural or very fatiguing. To aid movement througha given model a mini-map could be added to the modelnavigation screen, including a marker showing the user’scurrent location. At the moment the camera view canonly be moved forward, backward and panned sideways.This could be expanded to include a tilt feature, with anappropriate gesture being used to perform this. Furtherimprovements to the current system, such as varying thespeed of motion based on extent of hand positions or a tu-torial mode to familiarise users with the interface can alsobe developed. The other capabilities of the Kinect suchas the microphone array and RGB camera can be used tocontrol the interface using voice commands or design acustomised avatar of the user.

9. ConclusionIn this project, a natural user interface was developed toprovide architects and their clients with a more intuitiveway to interact and explore 3D architectural models. Allthe requirements for this project have been successfullyachieved. 7 hand gestures and 2 body poses are recog-nised by the system. Users can use these gestures to ex-

perience the architectural space in a given model. Theprogram also has the capability of importing various 3Dmodel file formats such as 3DS, OBJ and FBX. The plat-form has also been designed to be cross platform com-patible. Users are provided constant feedback about theirmapping into the 3D domain with the use of an avatar.The avatar mirrors the users actions allowing the user toknow how their actions are being translated in the virtualspace. A user study was conducted to test the naturalnessof the system. The study yielded positive results, indicat-ing that most users found the interface intuitive and easyto use. However, it was noted that extensive usage of fullbody and arm movements fatigued the user after 8 to 10minutes of usage. This can be improved by using fingertracking to develop less strenuous hand gestures. Thus,a natural user interface using the Kinect to explore 3Dmodels has been shown to have great potential due to itsease of use and intuitiveness.

10. AcknowledgementsSpecial thanks to our examiners Dr. Robert Amor and Dr.Robert Sheehan for their support and guidance through-out the project. Also special thanks to Dr. DermottMcMeel from the School of Architecture for his contin-ual feedback.

11. References[1] J.-H. Kim, N. D. Thang, and T.-S. Kim, “3d hand

motion tracking and gesture recognition using adata glove,” in Industrial Electronics, 2009. ISIE2009. IEEE International Symposium on, July 2009,pp. 1013 –1018.

[2] Y. Han, “A low-cost visual motion data gloveas an input device to interpret human hand ges-tures,” Consumer Electronics, IEEE Transactionson, vol. 56, no. 2, pp. 501 –509, may 2010.

[3] B. Takacs, “How and why affordable virtual realityshapes the future of education,” The InternationalJournal of Virtual Reality, 2008, 7 (1): 53, vol. 66,2008.

[4] (2011) Microsoft xbox website. [Online]. Available:http://www.xbox.com/en-US/kinect

[5] Z. gang Xu and H. lei Zhu, “Vision-based detec-tion of dynamic gesture,” in Test and Measure-ment, 2009. ICTM ’09. International Conferenceon, vol. 1, dec. 2009, pp. 223 –226.

[6] U.-X. Tan, K. Veluvolu, W. T. Latt, C. Y. Shee,C. Riviere, and W. T. Ang, “Estimating displace-ment of periodic motion with inertial sensors,” Sen-sors Journal, IEEE, vol. 8, no. 8, pp. 1385 –1388,aug. 2008.

[7] M. Hasanuzzaman, V. Ampornaramveth, T. Zhang,M. Bhuiyan, Y. Shirai, and H. Ueno, “Real-timevision-based gesture recognition for human robotinteraction,” in Robotics and Biomimetics, 2004.ROBIO 2004. IEEE International Conference on,aug. 2004, pp. 413 –418.

[8] Y. Liu, L. Tang, K. Song, S. Wang, and J. Lin, “Amulticolored vision-based gesture interaction sys-tem,” in Advanced Computer Theory and Engineer-ing (ICACTE), 2010 3rd International Conferenceon, vol. 2, aug. 2010, pp. V2–281 –V2–284.

[9] (2010) Primesense inc. website. [Online]. Avail-able: http://www.primesense.com/?p=535

[10] J. Stowers, M. Hayes, and A. Bainbridge-Smith,“Altitude control of a quadrotor helicopter usingdepth map from microsoft kinect sensor,” in Mecha-tronics (ICM), 2011 IEEE International Conferenceon, april 2011, pp. 358 –362.

[11] V. Frati and D. Prattichizzo, “Using kinect forhand tracking and rendering in wearable haptics,”in World Haptics Conference (WHC), 2011 IEEE,june 2011, pp. 317 –321.

[12] W. Liu, “Natural user interface- next mainstreamproduct user interface,” in Computer-Aided Indus-trial Design Conceptual Design (CAIDCD), 2010IEEE 11th International Conference on, vol. 1, nov.2010, pp. 203 –205.

[13] G. Bruder, F. Steinicke, and K. Hinrichs, “Arch-explore: A natural user interface for immersive ar-chitectural walkthroughs,” in 3D User Interfaces,2009. 3DUI 2009. IEEE Symposium on, march2009, pp. 75 –82.

[14] (2011) Apple inc. website. [Online]. Available:http://www.apple.com/itunes/features/

Date post:	16-Jan-2016
Category:	Documents
Upload:	evo-moraes
View:	216 times
Download:	0 times

Kinect to Architecture

Documents