+ All Categories
Home > Documents > [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) -...

[IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) -...

Date post: 12-Dec-2016
Category:
Upload: rohit-kumar
View: 213 times
Download: 0 times
Share this document with a friend
5
Information Measure Ratio Based Real Time Approach for Hand Region Segmentation With a Focus on Gesture Recognition Ayan Chaki Innovation Lab TATA Consultancy Services Ltd. Kolkata, India [email protected] Pragya Jain Computer Science Department IEM Kolkata, India [email protected] Rohit Kumar Gupta Innovation Lab TATA Consultancy Services Ltd. Kolkata, India [email protected] Abstract—This paper presents an efficient and real time novel approach for hand region segmentation with an aim to achieve hand gesture recognition under varying illumination. The overall methodology is a two step process: (a) to achieve the segmentation of the hand from the complex background and (b) to recognize the hand gesture efficiently and accurately. Hand segmentation is achieved using block based picture information ratio and recognition is done using Principal Component Analysis (PCA). In the experimental results four basic hand gestures have been considered which were recognized consistently in complex but near constant background and varying illumination. The prototype has been developed in X86, tested with live videos captured by low cost webcams . It performs with 98% accuracy in real time. Keywords- hand gesture; Information Measure Ratio; PCA; segmentaion; human computer interface I. INTRODUCTION Gestures have always played an important role in human life. In the past two decades or so there has been a huge advancement in the field of vision based action recognition. One major challenge to achieve higher degree of accuracy for hand gesture recognition lies in segmenting the hand region under varying illumination and complex background [1]. The main purpose of developing such a system lies in the fact that gesture recognition has implementations in almost all fields. There are multiple needs for gesture recognition and it is note worthy to discuss the basics and need of gesture recognition. Gesture recognition has been used in many places from motion analysis to machine learning. Apart from this it serves many purposes from in-flight entertainment to medicinal uses. Different systems have been developed to implement gesture recognition in different ways. On one side we have the vision based action recognition systems which take the help of one or more cameras to get a 3D hand model for higher accuracy While on the other side the appearance based system exists which uses only a single camera and makes a set of templates using the training data fed to the machine from before. In this paper, real time hand region segmentation with an aim to achieve gesture recognition has been proposed. The proposed system is developed on an efficient and novel approach to segment the hand from the complex background and applying Principal Component Analysis (PCA) is proposed. It is done using a single camera with the aim of having an algorithm which is fast, simple and accurate. The simplicity of this system allows it to be used on every desktop computer. PCA has a unique property of reducing the dimensionality of the picture and hence making the process cheaper in terms of time which is a huge advantage in real-time systems. This paper has been divided into sections with Section 2 containing information of the previous work that has been done in this field, Section 3 containing the main implementation of the algorithm, Section 4 containing the results and discussions and Section 5 concluding the paper. II. STATE OF THE ART Gesture based recognition can be thought of as a classification problem where as an input a set of images are given and the desired actions are obtained as the output. Much work has been done in this field with each having a unique approach but still trying to achieve the same goal. Lu and Little [2] have used a Hybrid Hidden Markov Model (HMM) with two first order processes which allowed them to combine tracking and recognition both in a single frame. They introduced dependencies between the current and previous action to ensure better recognition. It is to be mentioned that one of the best approaches is implementing the Histograms of Oriented Gradients (HoG). This fact has been experimentally proved by Dalal and Triggs [3]. They have used a linear SVM (Support Vector Machine) as a baseline classifier to achieve speed and simplicity. An integration of cascade-of-rejecters approach with HoG to achieve a faster human detection system has been done by Zhu et al. [4]. Using an integral image approach with variable size blocks allowed them achieve this aim. After getting the blocks they used the AdaBoost to get the best suited blocks and used these blocks to construct the rejector-based cascade. Wei-Lwun et al [10] also 2011 Second International Conference on Intelligent Systems, Modelling and Simulation 978-0-7695-4336-9/11 $26.00 © 2011 IEEE DOI 10.1109/ISMS.2011.36 172 2011 Second International Conference on Intelligent Systems, Modelling and Simulation 978-0-7695-4336-9/11 $26.00 © 2011 IEEE DOI 10.1109/ISMS.2011.36 172
Transcript
Page 1: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

Information Measure Ratio Based Real Time Approach for Hand Region Segmentation With a Focus on Gesture Recognition

Ayan Chaki Innovation Lab

TATA Consultancy Services Ltd. Kolkata, India

[email protected]

Pragya Jain Computer Science Department

IEM Kolkata, India

[email protected]

Rohit Kumar Gupta Innovation Lab

TATA Consultancy Services Ltd. Kolkata, India

[email protected]

Abstract—This paper presents an efficient and real time novel approach for hand region segmentation with an aim to achieve hand gesture recognition under varying illumination. The overall methodology is a two step process: (a) to achieve the segmentation of the hand from the complex background and (b) to recognize the hand gesture efficiently and accurately. Hand segmentation is achieved using block based picture information ratio and recognition is done using Principal Component Analysis (PCA). In the experimental results four basic hand gestures have been considered which were recognized consistently in complex but near constant background and varying illumination. The prototype has been developed in X86, tested with live videos captured by low cost webcams . It performs with 98% accuracy in real time.

Keywords- hand gesture; Information Measure Ratio; PCA; segmentaion; human computer interface

I. INTRODUCTION Gestures have always played an important role in human

life. In the past two decades or so there has been a huge advancement in the field of vision based action recognition. One major challenge to achieve higher degree of accuracy for hand gesture recognition lies in segmenting the hand region under varying illumination and complex background [1].

The main purpose of developing such a system lies in the fact that gesture recognition has implementations in almost all fields. There are multiple needs for gesture recognition and it is note worthy to discuss the basics and need of gesture recognition. Gesture recognition has been used in many places from motion analysis to machine learning. Apart from this it serves many purposes from in-flight entertainment to medicinal uses.

Different systems have been developed to implement gesture recognition in different ways. On one side we have the vision based action recognition systems which take the help of one or more cameras to get a 3D hand model for higher accuracy While on the other side the appearance based system exists which uses only a single camera and makes a set of templates using the training data fed to the machine from before.

In this paper, real time hand region segmentation with an aim to achieve gesture recognition has been proposed. The proposed system is developed on an efficient and novel approach to segment the hand from the complex background and applying Principal Component Analysis (PCA) is proposed. It is done using a single camera with the aim of having an algorithm which is fast, simple and accurate. The simplicity of this system allows it to be used on every desktop computer. PCA has a unique property of reducing the dimensionality of the picture and hence making the process cheaper in terms of time which is a huge advantage in real-time systems.

This paper has been divided into sections with Section 2 containing information of the previous work that has been done in this field, Section 3 containing the main implementation of the algorithm, Section 4 containing the results and discussions and Section 5 concluding the paper.

II. STATE OF THE ART

Gesture based recognition can be thought of as a classification problem where as an input a set of images are given and the desired actions are obtained as the output. Much work has been done in this field with each having a unique approach but still trying to achieve the same goal. Lu and Little [2] have used a Hybrid Hidden Markov Model (HMM) with two first order processes which allowed them to combine tracking and recognition both in a single frame. They introduced dependencies between the current and previous action to ensure better recognition.

It is to be mentioned that one of the best approaches is implementing the Histograms of Oriented Gradients (HoG). This fact has been experimentally proved by Dalal and Triggs [3]. They have used a linear SVM (Support Vector Machine) as a baseline classifier to achieve speed and simplicity. An integration of cascade-of-rejecters approach with HoG to achieve a faster human detection system has been done by Zhu et al. [4]. Using an integral image approach with variable size blocks allowed them achieve this aim. After getting the blocks they used the AdaBoost to get the best suited blocks and used these blocks to construct the rejector-based cascade. Wei-Lwun et al [10] also

2011 Second International Conference on Intelligent Systems, Modelling and Simulation

978-0-7695-4336-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISMS.2011.36

172

2011 Second International Conference on Intelligent Systems, Modelling and Simulation

978-0-7695-4336-9/11 $26.00 © 2011 IEEE

DOI 10.1109/ISMS.2011.36

172

Page 2: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

proposed a HoG based approach to represent the objects and an efficient off-line learning algorithm to learn the templates from training data.

Kaaniche [6] proposed gesture recognition method based on local motion learning using a 2D HoG Descriptor along with a temporal one. He proposed a new tracking algorithm based on a frame-to-frame HoG tracker and using an extended kalman filter. Different uses of gesture recognition have come up ,on one side Amstutz et al. [5] proposed gesture recognition to be integrated with a watch device and on the other side Freeman and Weissman [7] suggested it be implemented in television. In this the open palm is used as a trigger gesture and once the TV recognizes this open hand it acts according to the gesture shown after that. It is to be noted that when the television is off or playing a program it continuously looks for this trigger gesture. Gesture recognition has also been used for medical applications [8] as well where it can be used to introduce sterile human-machine interfaces. This avoids the hassle of sterilizing the interfaces after each operation. J. Shi and C. Tomasi [11] proposed contrast based and motion based new features for efficient tracking. However, the problem in their approach is to track glossy surfaces.

A hand gesture recognition system has also been developed by Hai Wu [9] where PCA is used along with a hierarchical decision tree based on multi-scale theory. This system most closely resembles the one implemented in this paper. The main difference is in the production of a binary tree of principal component spaces where each level of the tree represents a different degree of blurring hence reducing the search time. A dynamic hand gesture recognition is proposed by Kaustubh S. Patwardhan et al. [12]. In the work a novel approach is proposed to select the proper set of hand gestures from vocabulary.

In our approach, we proposed the system in a way such that it will give a reliable result under changing background within a tolerance level.

III. PROPOSED METHODOLOGY

The main approach used in this algorithm is to keep just the required and relevant information of an image and encode it in the most efficient way possible. After this each encoded image can be compared with others encoded in the same manner. The best possible way to achieve the above is to map the variation of each image and use this variation information to encode and compare. Basically the above statement means performing PCA on the set of images. This can be thought of as finding the eigenvectors of the covariance matrix with each image contributing to the eigenvector to help us form the eigen image. Before performing PCA on these images, segmentation is done to separate the hand region from the background.

The proposed methodology is described in Figure 1.

Figure 1. : Overview of the algorithm

A. Segmentation

By segmentation we basically mean that we will divide the image into fixed sized n X n blocks containing relevant information. This actually makes the image will be easier to analyze and examine. It is this segmentation which allows our algorithm to work in a more efficient manner. Basically we can think of segmentation to consist of three different but related activities namely,(i)Background subtraction,(ii)Block based Information Measure Ratio (IMR) (iii) Noise removal. The main reason for using the IMR is to take care of the varying illumination condition. The steps for achieving the segmentation are given in Figure 2.

Figure 2. : Steps for Hand Region Segmentation

The pseudo-code to achieve segmentation is given below:

• Compute difference image between the empty frame and the frame consisting of the hand i.e after initiating the gesture detection algorithm

• Divide the difference image into nxn blocks • Calculate the Information Measure Ratio

(IMR) for each block. The IMR is computed as below:

LjL nihMax /)(=μ

Where Lμ is the Information Measure Ratio for

block L, )(ihMax j is the maximum value of histogram and j is the value and n is the no.of

Image Measurement Ratio Based Segmentation

Offline Training

Real time recognition and decision making

Difference Frame Computation

Sub-division of each frame to n*n blocks

Information Measure Ratio (μ) Calculation for each block

Segmented image output using threshold (th)

173173

Page 3: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

pixels in block L • Set threshold )( ht based on the ratio Lμ to

segment relevant information from each block from the background. In our experiment the threshold is calculated as Lht μ*5.0=

• Segment each block based on th as below

B. Training

The training module for the proposed system is an offline and one-time process. In our approach, Principal Component Analysis (PCA) based approach is implemented for training the system. The main idea of PCA is to supply the user with a lower dimensional picture (a shadow) of the object from the most informative point of view. This is done by moving variance into first few components and hence allowing the later dimensions to be dropped without a significant loss in information. Figure 3 shows the steps for Training.

Figure 3. : PCA Based Training Module

The pseudo-code for the training module is given below:

• Each segmented image is converted to a single column vector (In)

• The Average matrix )(Γ of all the images is calculated using

∑=

=ΓN

nnIN

1/1

where N is the no.of training images • The difference matrix ,Φ ,is calculated using Γ−= nn Iφ • Find the covariance matrix,C, using

=

=N

n

TnnNC

1*/1 φφ

• Calculate the eigen vector and eigen values of the

covariance matrix

C. Recognition

The recognition module is an online process and it starts from encoding the input image in the same way as done in the training module. The online recognition module is shown in Figure 4.

Figure 4. : Online Recognition Module

The pseudo-code for hand gesture recognition is as below:

• Transform new image into its components as below

)( Γ−= testT

kk Iλω Where k=1,……,N’

N’ is the reduced dimension, kλ are the corresponding eigen vectors and Itest is the test vector

• Compute the vector ΩT such that ...... 21 k

T ωωω=Ω • Find the Euclidian distance Єk

|||| kk Ω−Ω=∈ • Sort the Euclidian distance and choose the image

with the least Euclidian distance. If least Euclidean distance is within the threshold, classify the test image as the image corresponding to this Euclidian distance. The threshold is selected based on experiments.

IV. RESULTS AND DISCUSSION The proposed prototype is developed in X86 and the

testing is done using Webcam having 320*240 resolution. The aim for the project is to develop a low cost system which will be accurate and precise in detecting hand gesture under a tolerance level of background distortion. In our case, the tolerance is up to 70% similarity between the background of the reference frame and the frame under testing. In this section we shall describe the experimental results against two parameters (i) Recall and (ii) Precision.

Recall (r) can be defined as:

mccr+

=

Convert Image matrix to a Column Vector (In )

Calculate Average matrix ( Γ) and Difference matrix (Φ)

Calculate the Covariance matrix(C)

Find the Eigenvector vn and Eigenvalue kλ

Transform new image into its components ω

Vector ΩT computation

Euclidian distance (Єk ) based hand gesture recognition

174174

Page 4: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

Precision (p) is defined as:

fpccp

+=

Where c is number of correct recognition m is the

number of misses and fp is number of false positives. We have tested the system on 150 frames consisting of 4 different gestures. The result is given in Table 1.

TABLE 1: DETAILS OF TEST RESULTS

Gesture Type

# frames

Precision Recall

Right 50 .98 .94 Left 25 .97 .92 Up 40 .98 .92

Down 35 .98 .95

It works with an average recall rate of 93% and precision rate of 98%. We have found that the errors are occurring when

• There are abrupt changes between the reference background and the background of the frame under testing • The video quality is very degradable Similarly the reasons for false positive are

• There is considerable similarity between two gestures • Training for a particular gesture is not done adequately using sufficient number of samples Some of the test results with the output of the segmentation module is given in Figure 5. The recall rate shows that there is scope of improvement mainly in the segmentation module.

V. CONCLUSION

The above mentioned results prove the reliability and efficiency of the segmentation algorithm and this method can be extended to face recognition also. It is basically the real time segmentation algorithm using IMR that allows to reduce the time complexity of the algorithm. Though the algorithm seems to be quite promising, the thresholding technique can be improved further. Also, future work includes not only taking up further challenges such as tracking dynamic hand gestures but also trying to further reduce the time complexity and thus allowing it to be easily implemented in hard real time systems.

Figure 5. : Result after segmentation of the 4 input gestures

.REFERENCES [1] Pickering, Carl A. Burnham, Keith J. Richardson, Michael J. Jaguar

,“A research Study of Hand Gesture Recognition Technologies and Applications for Human Vehicle Interaction”, 3rd Conference onAutomotive Electronics, 2007

[2] W. L. Lu and J. J. Little, “Tracking and recognizing actions at a distance,” in Proceedings of the ECCV Workshop on Computer Vision Based Analysis in Sport Environments (CVBASE ’06), Graz, Austria, May 2006.

[3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in International Conference on Computer Vision and Pattern Recognition, vol. 1. San Diego, CA,USA: IEEE Computer Society Press, June 20-25 2005

[4] Qiang Zhu, Shai Avidan, Mei-Chen Yeh, and Kwang-Ting Cheng,”Fast Human Detection Using a Cascade of Histograms of Oriented Gradients”,2006.

[5] Roman Amstutz, Oliver Amft, Brian French, Asim Smailagic, Dan Siewiorek, Gerhard Troster,”Performance analysis of an HMM-based gesture recognition using a wristwatch device”.

[6] Mohamed Becha Kaaniche,”Tracking HoG Descriptors for Gesture Recognition”.

[7] William T. Freeman, Craig D. Weissman,”Television Control by Hand Gestures”,TR94-24 December 1994.

[8] J.C.Andreshak, S.Lumelsky, I.F.Chang, T.P.Mears,A.A. Stone, W.W. Stead,”Medication Charting Via Computer Gesture Recognition”.

[9] Hai Wu,”Dynamic Gesture Recognition Using PCA with Multi-scale Theory and HMM”

[10] Wei-Lwun Lu,Kenji Okuma, James J. Little,”Tracking and Recognizing Multiple Hockey Players using the Boosted Particle Filter” ,Master's Thesis, The University of British Columbia, 2007.

175175

Page 5: [IEEE 2011 2nd International Conference on Intelligent Systems, Modelling and Simulation (ISMS) - Phnom Penh, Cambodia (2011.01.25-2011.01.27)] 2011 Second International Conference

[11] J. Shi and C. Tomasi, “Good features to track,” in International Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: Springer, June 1994, pp. 593–600.

[12] Kaustubh S. Patwardhan, Sumantra Dutta Roy,”Dynamic Hand Gesture Recognition using Predictive EigenTracker”.

176176


Recommended