XBOX 360 SYSTEMSeminar Report
XBOX 360 SYSTEM
Virtual reality (VR) is the creation of a highly interactive computer based multimedia environment in which the user becomes a participant with the computer in what is known as a synthetic environment. Virtual reality uses computers to immerse one inside a threedimensional program rather than simulate it in two-dimensions on a monitor. Utilizing the concept of virtual reality, the computer engineer integrates video technology, high resolution image-processing, and sensor technology into the data processor so that a person can enter into and react with three-dimensional spaces generated by computer graphics. The goal computer engineers have is to create an artificial world that feels genuine and will respond to every movement one makes, just as the real world does. Naming discrepancies aside, the concept remains the same - using computer technology to create a simulated, three-dimensional world that a user can manipulate and explore while feeling as if he were in that world. Scientists, theorists and engineers have designed dozens of devices and applications to achieve this goal. Opinions differ on what exactly constitutes a true VR experience, but in general it should include: Three-dimensional images that appear to be life-sized from the perspective of the user The ability to track a user's motions, particularly his head and eye movements, and correspondingly adjust the images on the user's display to reflect the change in perspective .
Virtual realities are a set of emerging electronic technologies, with applications in a wide range of fields. This includes education, training, athletics, industrial design, architecture and landscape architecture, urban planning, space exploration, medicine and rehabilitation, entertainment, and model building and research in many fields of science. Virtual reality (VR) can be defined as a class of computer-controlled multisensory communication technologies that allow more intuitive interactions with data and involve human senses in new ways. Virtual reality can also be defined as an environment created by the computer in which the user feels present. This technology was devised to enable people to deal with information more easily. Virtual Reality provides a different way to see and experience information, one that is dynamic and immediate. It is also a tool for model building and problem solving. Virtual Reality is potentially a tool for experiential learning. The virtual world is interactive; it responds to the users actions.
Virtual Reality is defined as a highly interactive, computer-based multimedia environment in which the user becomes the participant in a computer-generated world. It is the simulation of a real or imagined environment that can be experienced visually in the three dimensions of width, height, and depth and that may additionally provide an interactive experience visually in full real-time motion with sound and possibly with tactile and other forms of feedback. VR incorporates 3D technologies that give a real life illusion. VR creates a simulation of real-life situation. The emergence of augmented reality technology in the form of interactive games has produced a valuable tool for education. One of the emerging strengths of VR is that it enables objects and their behaviour to be more accessible and understandable to the human user.
DIFFERENT KINDS OF VIRTUAL REALITY
There is more than one type of virtual reality. Furthermore, there are different schemas for classifying various types of virtual reality. Jacobson (1993a) suggests that there are four types of virtual reality: (1) immersive virtual reality,(2) desktop virtual reality, (3) projection virtual reality, (4) simulation virtual reality, (5) Augmented virtual reality, and (6) Text-based virtual reality. Immersive virtual reality An Immersive VR system is the most direct experience of virtual environments. Here the user either wears an head mounted display (HMD) or uses some form of head-coupled display such as a Binocular Omni-Orientation Monitor (BOOM) to view the virtual environment, in addition to some tracking devices and haptic devices. It is a type of VR in which the user becomes immersed (deeply involved) in a virtual world. It is also a form of VR that uses computer related components.
A variation of immersive virtual reality is Augmented Reality where a see-through layer of computer graphics is superimposed over the real world to highlight certain features and enhance understanding. Augmented virtual reality is the idea of taking what is real and adding to it in some way so that user obtains more information from their environment. Azuma (1999) explains, Augmented Reality is about augmentation of human perception: supplying information not ordinarily detectable by human senses. According to Isdale (2001), there are four types of augmented reality (AR) that can be distinguished by their display type, including: 1. Optical See-Through AR uses a transparent Head Mounted Display (HMD) to display the virtual environment (VE) directly over the real world. 2. Projector Based AR uses real world objects as the projection surface for the VE. 3. Video See-Through AR uses an opaque HMD to display merged video of the VE with and view from cameras on the HMD. 4. Monitor-Based AR also uses merged video streams but the display is a more conventional desktop monitor or a hand held display. Monitor-Based AR is perhaps the least difficult to set up since it eliminates HMD issues.
Text-based Virtual Reality In this type of virtual reality, a reader of a certain text, form a mental model of this virtual world in their head from the description of people, places and things.
Through the Window With this kind of system, also known as desktop VR, the user sees the 3-D world through the window of the computer screen and navigates through the space with a control device such as a mouse. Like immersive virtual reality, this provides a first-person experience. One low-cost example of a Through the window virtual reality system is the 3-D architectural design planning tool Virtus Walkthrough that makes it possible to explore virtual reality on a Macintosh or IBM computer. Another example of through the window virtual reality comes from the field of dance, where a computer program called Life Forms lets choreographers create sophisticated human motion animations.
Projected Realities Projected realities (Mirror worlds) provide a second-person experience in which the viewer stands outside the imaginary world, but communicates with characters or objects inside it. Mirror world systems use a video camera as an input device. Users see their images superimposed on or merged with a virtual world presented on a large video monitor or video projected image.
2. LITERATURE SURVEY
2.1. EXISTING SYSTEMS
Head-Mounted Display (HMD)
The head-mounted display (HMD) was the first device providing its wearer with an immersive experience. Evans and Sutherland demonstrated a head-mounted stereo display in 1965. A typical HMD houses two miniature display screens and an optical system that channels the images from the screens to the eyes, thereby, presenting a stereo view of a virtual world. A motion tracker continuously measures the position and orientation of the user's head and allows the image generating computer to adjust the scene representation to the current view. As a result, the viewer can look around and walk through the surrounding virtual environment. To overcome the often uncomfortable intrusiveness of a head-mounted display, alternative concepts (e.g., BOOM and CAVE) for immersive viewing of virtual environments were developed.
Fig 1: HMDBOOM
The BOOM (Binocular Omni coupled stereoscopic display device. Screens and optical system are housed in a box that is attached to a multi-link arm. The user looks into the box through two holes, sees the virtual world, and can guide the box to any position within the operational volume of tracking is accomplished via sensors in the links of the arm that holds the box.
Fig 2: BOOMCAVE
The CAVE (Cave Automatic Virtual Environment) was developed at the University of Illinois at Chicago and provides the illusion of immersion by projecting stereo images on the walls and floor of a room-sized cube. Several persons wearing lightweight stereo glasses can enter and walk freely inside the CAVE. A head tracking system continuously adjusts the stereo projection to the current position of the leading viewer. The advantages of CAVE are that, it gives a wide surrounding field of view and itit has the ability to provide a shared experience to a small group. A variety of input devices like data gloves, joysticks, and hand-held wands allow the user to navigate through a virtual environment and to interact with virtual objects. Directional sound, tactile and force feedback devices, voice recognition and other technologies are being employed to enrich the immersive experience and to create more "sensualized"interfaces.
Fig 3: CAVEData Glove A data glove is outfitted with sensors on the fingers as well is as an overall position/ orientation tracking equipment. Data glove enables natural interaction with virtual objects by hand gesture recognition. Modern VR gloves are used to communicate hand gestures (such as pointing and grasping) and in some cases return tactile signals to the users hand. Fig 4: Data gloveConcerned about the high cost of the most complete commercial solutions, Pamplona et al. propose a new input device: an image-based data glove (IBDG). By attaching a camera to the hand of the user and a visual marker to each finger tip, they use computer vision techniques to estimate the relative position of the finger tips. Once they have information about the tips, they apply inverse kinematics techniques in order to estimate the position of each finger joint and recreate the movements of the fingers of the user in a virtual world. Adding a motion tracker device, one can also map pitch, yaw, roll and XYZ-translations of the hand of the user, (almost) recreating all the gesture and posture performed by the hand of the user in a low cost device.
2.2. PROPOSED SYSTEM
Microsoft Xbox 360 Kinect has revolutionized gaming In that you are able to use your entire body as the controller. Conventional Controllers are not required because the Kinect Sensor picks Up on natural body movements as inputs for the game. Three major components play a part in making the Kinect function as it does; the movement tracking, the speech recognition, and the motorized tilt of the sensor itself. The name Kinect is a permutation of two words; Kinetic and Connect.
The Kinect was first announced on June 1, 2009 at E3 (Electronic Entertainment Expo) as Project Natal, the name stems from one of the key project leaders hometown named Natal in Brazil. The software that makes Kinect function was by and large developed by Rare, a Microsoft subsidiary. A company based In Israel known as PrimeSense developed the 3D sensing technology. Microsoft purchased the rights to use the technology for their gaming system. In the first 60 days on the market, Microsoft shipped 8 million units to retailers around the globe. The estimated Bill of Materials cost for the Kinect is estimated to be $56, which does not include Research and Development or Marketing costs, merely the cost of the hardware.
(a) Sensing technology Behind the scene of PrimeSense's 3D sensing technology there are three main parts that make it work. An infrared laser projector, infrared camera, and the RGB colored camera. The depth projector simply floods the room with IR laser beams creating a depth field that can be seen only by the IR camera. Due to infrareds insensitivity to ambient light, the Kinect can be played in any lighting conditions. However, because the face recognition system is dependent on the RGB camera along with the depth sensor, light is needed for the Kinect to recognize a calibrated player accurately. The following image shows a generalized concept of how kinect's depth sensing works.
Figure 5: How the sensor sees in 3DIn more detail, the IR depth sensor is a monochrome complimentary metal--oxide-- semiconductor (CMOS) camera. This means that it is only sees two colors, in this case black and white which is all thats needed to create a "depth map" of any room. The IR camera used in the Kinect is VGA resolution (640x480) refreshing at a rate of 30Hz. Each camera pixel has a photodiode connected to it, which receives the IR light beams being bounced off objects in the room. The corresponding voltage level of each photodiode depends on how far the object is from the camera. An object that is closer to the camera appears brighter than an object that is farther away. The voltage produced by the photodiode is directly proportional to the distance the object. Each voltage produced by the photodiode is then amplified and then sent to an image processor for further processing. With this process being updated 30 times per second, you can imagine the Kinect has no problem detecting full--body human movements very accurately considering the player is within recommended distance.
Although the hardware is the basis for creating an image that the processor can interpret, the software behind the Kinect is what makes everything possible. Using statistics, probability, and hours of testing different natural human movements the programmers developed software to track the movements of 20 main joints on a human body. This software is how the Kinect can differentiate a player from say a dog that happens to run in front of the IR projector or different players that are playing a game together. The Kinect has the capabilities of tracking up to six different players at a time, but as of now the software can only track up to two active players. One of the main features of the Kinect is that it can recognize you individually. When calibrating yourself with the Kinect, the depth sensing and the color camera work together to develop an accurate digital image of how your face looks. The 8-- bit color camera, also VGA resolution, detects and stores the skin tone of the person it is calibrating. The depth sensor helps make the facial recognition more accurately by creating 3--D shape of your face. Storing these images of your face and skin tone color is how the Kinect can recognize you when you step in front of the projected IR beams. As mentioned earlier, for the facial recognition to work accurately there needs to be a certain amount of light. Another added feature of the color camera is it takes videos or snapshots at key moments during game play so you can see how you look while playing.
Figure 7: Facial Recognition
(b) Speech recognition The Xbox 360 Kinect is also capable of speech recognition, which is it will not only respond to natural body movements, but it will respond to a voice commands as well. This was a technology designed for Kinect solely developed by Microsoft. Microsoft engineers travelled to an estimated 250 different homes to test out their voice recognition system. They placed 16microphones all over each room to test the acoustics, echoing, etc. to get a feel of how the Kinect would respond in different environments. The end result was placing4 downward facing microphones on the bottom of the Kinect unit, which would then listen to human voices. This is also a part of why the Kinect is so physically wide, because of the microphone placement. The 3D sensing portion only needs about half of the width the Kinect is now. The combination of the microphone placement and the motion sensing technology allows the Kinect to zero-- in on the users voice and able to tell where the sound is coming from while cancelling out other ambient noise. There are 4 microphones, so this means that the audio portion of the Kinect has 4 separate channels. The resolution of the audio is 16 bits and the audio is also sampled at 16 kHz. There are three major languages supported by Kinect thus far: English, Spanish, and Japanese with plans to support other popular languages soon. The Kinect is always listening as long as it is turned on, when the user says Xbox the user will be prompted to select one of the options from the screen. Popular options are Play Game, Watch a Movie or Sign In. One of the major techniques involved with the Kinects ability to block out noise is known as echo cancellation.
Figure 8: The Kinect Sensor
(c) Motorized tilt The Kinect comes equipped with a built in motor that is able to tilt the entire unit up or down, expanding its field of view. Without moving, the Kinect is capable of having a 43vertical viewing angle and a 57 horizontal viewing angle. With the addition of a tilt, its vertical view is expanded to 27. The Kinect is powered via standard USB connection; however it also requires a special type of connector for the motor. USB is capable of supplying 2.5W; however this is not enough power to run the sensor and the motor simultaneously. So Microsoft developed a special connector that draws power from the Xboxs power supply, however this comes with the newer Xboxs only. Older Xbox models must have a separate power supply for the Kinect.
(d) Human Detection Using Depth Information by Kinect
Overview of human detection method. Pre-processing To prepare the data for processing, some basic pre-processing is needed. In the depth image taken by the Kinect, all the points that the sensor is not able to measure depth are offset to 0 in the output array. We regard it as a kind of noise. To avoid its interference, we want to recover its true depth value. It is supposed that the space is continuous, and the missing point is more likely to have a similar depth value to its neighbours.. With this assumption, we regard all the 0 pixels as vacant and need to be filled.
2D chamfer distance matching
The first stage of the method is to use the edge information embedded in the depth array to locate the possible regions that may indicate the appearance of a person. It is a rough scanning approach in that we need to have a rough detection result with a false negative rate as low as possible but may have a comparatively high false positive rate to provide to the next stage. We use 2D chamfer distance matching in this stage for quick processing. Also, chamfer distance matching is a good 2D shape matching algorithm that is invariant to scale, and it utilizes the edge information in the depth array which means the boundary of all the objects in the scene. We use Canny edge detector to find all edges in the depth array. To reduce calculation and reduce the disturbance from the surrounding irregular objects, we eliminate all the edges whose sizes are smaller than a certain threshold. We use a binary head template and match the template to the resulted edge image. To increase the efficiency, a distance transform is calculated before the matching process. This results in a distance map of the edge image, where pixels contain the distances to the closest data pixels in the edge image. Matching consists of translating and positioning the template at various locations of the distance map; the matching measure is determined by the pixel values of the distance image which lie under the data pixels of the transformed template. The lower these values are, the better the match between image and template at this location.
Generate 3D model
Considering the calculation complexity of 3D model fitting is comparatively high, we want the model to be view invariant so that we dont have to use several different models or rotate the model and run several times. The model should generalize the characteristics of the head from all views: front, back, side and also higher and lower views when the sensor is placed higher or lower or when the person is higher or lower. To meet these constraints and make it the simplest, we use a hemisphere as the 3D head model.
We extract the overall contour of the person so that we may track his/her hands and feet and recognize the activity. In an RGB image, despite the person is standing on the ground, it is less a problem to detect the boundary between the feet and the ground plane using gradient feature. However, in a depth array, the values at the persons feet and the local ground plane are the same. Therefore, it is not feasible to compute humans whole body contours from a depth array using regular edge detectors. The same applies when the person touches any other object that is partially in the same depth with the person. To resolve this issue, we take advantages of the fact that persons feet generally appear upright in a depth array regardless of the posture.
We use the filter response to extract the boundary between the persons and the ground. We develop a region growing algorithm to extract the whole body contours from the processed depth array. It is assumed that the depth values on the surface of a human object are continuous and vary only within a specific range. The algorithm starts with a seed location, which is the centroid of the region detected by 3-D model fitting. The rule for growing a region is based on the similarity between the region and its neighbouring pixels. The similarity between two pixels x and y in the depth array is defined as: S(x, y)| depth(x) -depth(y) | (a) (b)(a) Original depth array. Some parts of the body are merged with the ground plane and wall. (b) The input depth array to the region growing algorithm.
Table1. Region Growing Algorithm
(c) Result of our region growing algorithm. (d) The extracted whole body contours are superimposed on the depth map.The depth of a region is defined by the mean depth of all the pixels in that region:
Tracking Finally, we give preliminary results on tracking using depth information based on our detection result. Tracking in RGB image is usually based on color, the assumption is that the color of the same object in different time frames should be similar. But in depth images we dont have such color information. What we have is the 3D space information of the objects, so that we can measure the movements of the objects in a 3D space. We assume that the coordinates and speed of the same objects in neighbouring frames change smoothly.
As the technologies of virtual reality evolve; the applications of VR become literally unlimited. It is assumed that VR will reshape the interface between people and information technology by offering new ways for the communication of information, the visualization of processes, and the creative expression of ideas. A virtual environment can represent any threedimensional world that is either real or abstract. This includes real systems like buildings, landscapes, underwater shipwrecks, spacecrafts, archaeological excavation sites, human anatomy, sculptures, crime scene reconstructions, solar systems, and so on. Of special interest is the visual and sensual representation of abstract systems like magnetic fields, turbulent flow structures, molecular models, mathematical systems, auditorium acoustics, stock market behaviour, population densities, information flows, and any other conceivable system including artistic and creative work of abstract nature. These virtual worlds can be animated, interactive, shared, and can expose behaviour and functionality.
Useful applications of VR include training in a variety of areas (military, medical, equipment operation, etc.), education, design evaluation (virtual prototyping), architectural walk-through, human factors and ergonomic studies, simulation of assembly sequences and maintenance tasks, assistance for the handicapped, study and treatment of phobias (e.g., fear of height), entertainment, and much more. Virtual reality appears to offer educational potentials in the following areas: (1) data gathering and visualization, (2) project planning and design, (3) the design of interactive training systems, (4) virtual field trips, (5) the design of experiential learning environments.
Virtual reality also offers many possibilities as a tool for non-traditional learners, including the physically disabled and those undergoing rehabilitation who must learn communication and psychomotor skills.In industry, VR has proven to be an effective tool for helping workers evaluates product designs. In 1999, BMW explored the capability of VR for verifying product designs. They concluded that VR has the potential to reduce the number of physical mock-ups needed to improve overall product quality, and to obtain quick answers in an intuitive way during the concept phase of a product. In addition, Motorola developed a VR system for training workers to run a pager assembly line (Wittenberg, 1995). In the past decade medical applications of virtual reality technology have been rapidly developing, and the technology has changed from a research curiosity to a commercially and clinically important area of medical informatics technology. Virtual reality is under exploration as a therapeutic tool for patients. For example, psychologists and other professionals are using virtual reality as tool with patients that are afraid of heights. NASA has developed a number of virtual environment projects. This includes the Hubble Telescope Rescue Mission training project, the Space Station Coupola training project, the shared virtual environment where astronauts can practice reconnoitring outside the space shuttle for joint training, human factors, and engineering design. NASA researcher Bowen Loftin has developed the Virtual Physics Lab where learners can explore conditions such as changes in gravity. Virtual reality can make it possible to reduce the time lag between receiving equipment and implementing training by making possible virtual prototypes or models of the equipment for training purposes. In entertainment field, virtual realities are used in movies and games. One of the advantages of using the VR games is that it creates a level playing field. These virtual environments eliminate contextual factors that create inequalities between learners, thereby interfering with the actual learning skills featured in the training program, that is, interpersonal skills, collaboration, and team-building. Serious games are being more and more deployed in such diverse areas as public awareness, military training, and higher education. One of the driving forces behind this stems from the rapidly growing availability of game technologies, providing not only better, faster, and more realistic graphics, physics, and animations, but above all making the language of game development accessible to increasingly more people. Game based simulations propose an architecture for a professional fire-fighter training simulator that incorporates novel visualization and interaction modes. The serious game, developed in cooperation with the government agency responsible for the training of fire and rescue personnel, is a good example of how virtual reality and game technology helps making the delicate combination of engaging level design and carefully tuned learning objectives. The emergence of augmented reality technology in the form of interactive games has produced a valuable tool for education. The Live communal nature of these games, blending virtual content with global access and communication, has resulted in a new research arena previously called, edutainment but more recently called learning games. Windows Live combined with Xbox 360 with Kinect technology provides an agile, real-time environment with case-based reasoning, where learners can enjoy games, simulations and face to face chat, stream HD movies and television, music, sports and even Twitter and Facebook, with others around the world, or alone, in the privacy of the home.
This seminar deals with virtual reality technology and its application in entertainment field. Very recently, the most advanced and a revolutionary technology related to virtual reality and entertainment; the Kinect technology was introduced by Microsoft for its X-Box 360 gaming console.
A lot of advancements have been made using VR and VR technology. VR has cut across all facets of human endeavours-manufacturing/business, exploration, defence, leisure activities, and medicine among others. The exciting field of VR has the potential to change our lives in many ways. There are many applications of VR presently and there will be many more in the future. Many VR applications have been developed for manufacturing, education, simulation, design evaluation, architectural walk-through, ergonomic studies, simulation of assembly sequences and maintenance tasks, assistance for the handicapped, study and treatment of phobias, entertainment, rapid prototyping and much more. VR technology is now widely recognized as a major breakthrough in the technological advance of science. VR is changing our life, eventually VR will increasingly become a part of our life.
In the past few decades, scientists have focused most of their attention on developing technologies that sharpen only the minds or relieve minds and bodies of certain duties. Most people love games. Play is a fundamental mode of expression, fulfils the human need to connect with the "other," and can even be fun. Serious play is also great exercise for the mind and spirit.
Virtual reality gaming has always been thought of as something that will be presented to the general public in the future, the only problem was we didnt know when. Well, it is the future and Microsofts Xbox Kinect has now made this a reality. Kinect uses true technology that science fiction hasnt even written about. Gaming has been brought to a whole new level of intelligence and interaction. Except, these technologies used within Kinect are not only revolutionizing the gaming world, but have laid a foundation for all consumer electronics, specifically user interface applications.
1. VIRTUAL REALITIES :Hilary McLellan, McLellan Wyatt Digital E.J. May, Professor Smith - ENC 1101, Virtual Reality:Today and Beyond, December 1996
2. Brooks Jr., F.P. (1999). Whats Real About Virtual Reality. IEEE Computer Graphics and Applications,19(6), 16-27.
4. Lu Xia, Chia-Chih Chen and J. K. Aggarwal.: Human Detection Using Depth Information by Kinect. The University of Texas at Austin, Department of Electrical and Computer Engineering, 2011
5. "How You Become the Controller --Xbox.com. Home - Xbox.com. Web .20 Feb. 2011.