Application specific Gesture Recognition

8/19/2019 Application specific Gesture Recognition

1/58

Applicationpecifc Hand

Gesture

Recognitionystem

Ankit Mishra

ELECTICAL EGG! "E#T$ GA%TAM &%""HA%I'ERSIT(

A)ITMISHRA*+,-GMAIL!C.M

/0100*122,3,,

mailto:[email protected]:[email protected]


2/58

ABSTRACT

Hand gesture recognition system can be used for interfacing between

computer and human using hand gesture. This work presents a technique for

a human computer interface through hand gesture recognition, the aim is to

recognize gestures and more generally static hand gestures accomplished by

an individual in a video sequence. Many techniques have already been

proposed in literature for gesture recognition in specific environment (e.g.

laboratory using the cooperation of several sensors (e.g. camera network,

individual equipped with markers.

!n this dissertation, a "esture #ecognition Method based on the $dge

%etection method as feature e&traction is proposed. The proposed algorithm

works on the basic knowledge' as the number of fingers increase for a

gesture, the hand area also increases, and hence it can be used as animportant criteria for #ecognition ystems.

The results of simulation clearly indicate that the recognition performance is

good only for a particular application and it needs to be modified if the

application area changes.

The current accuracy rate of the system is )*+ with the particular intensity

level at which images were taken on MT- $nvironment.

IST OF FIGURES

1


3/58

Figure

No.

Caption Page

No.

/./ 0roposed lgorithm for the 0ro1ect

2./ Ta&onomy of "esture categories

2.2 !nstrumented "love equipped with 0otentiometer and 3ptic 4iber

2.5 Tools for "esture #ecognition6 7lustering 8 7lassifying lgorithms

2.9 %ifferent #epresentation of "estures

5./ system rchitecture

5.2 (a 3riginal image' (b converted "reyscale !mage.

5.5 $dge detection !llustration

5.9 $dge detection using pre:defined operators

5.; $dge


4/58

5.* "radient "enerating pproach

5./@ 7onvolution mask used in the 0ro1ect

5.// 4inal image after convlouting with the mask

LIST OF TABLES

,


5/58

4

Table

No.

Particulars Page

No.

2./ 7omparison between contact device and vision device

9./ #esut


6/58

LIST OF ABBREIATIONS

!CI Human 7omputer

L"A -inear %iscriminant nalysis

##I Man:Machine !nteraction

"oF %egree of 4reedom

!OG Histogram of 3riented "radient

IS# !mplicit hape Model

2


7/58

CONTENT

7


8/58

5./ !ntroduction

5.2 cquisition of %ata

5.5 "esture Modelling

5.9 !mage 0reprocessing

5.; $dge %etection

5.;./ "enerating "radient !mages using 0redefined 4ilters

5.;.2 pproach for "radient

5.;.5 $dge 3perator used in the 0ro1ect

5.= 4eature $&traction

29

29

2>

2)

5@

52

59

5;

5=

C$apter ) #$C-T 73


9/58

AC,NO-LE"GE#ENT

! would like to e&press my gratitude and thanks to my supervisor, "r.

S$abana Uroo, ssistant 0rofessor, chool of $ngineering, "autam

uddha Cniversity, for her valuable guidance, and constant support

throughout my pro1ect work which provided the much needed continuity for

its completion.


10/58

CAN"I"ATE3S "ECLARATION

! hereby declare that the work embodied in this dissertation report entitled 4Application

Speci/ic Gesture Recognition S+ste05 submitted in the partial fulfilment of therequirements for the award of the degree of 6 7ear Integrate8 "ual "egree

Progra00eB. Tec$. 9Electrical Engineering: ; #. Tec$. 9Instru0entation <

Control:,to the chool of $ngineering, "autam uddha Cniversity, "reater


11/58

"edicated to

My #arents 6or their

Immense Moral Support

To 7hom it may concern8

)eep reaching 6or that Rain9o788

1:


12/58

C!APTER > %

INTRO"UCTION

11


13/58

C!APTER ? %

INTRO"UCTION

%.% General

"esture was the first mode of communication for the primitive cave men and nowadays

gesture recognition has been a prominent domain of research. "estures are an important

form of human interaction and communication6 hands are usually used to interact with

things (pick up, move and our body gesticulates to communicate with others (no, yes,

stop. Thus, a wide range of gesture recognition applications has been e&perienced up tonow thanks to a certain level of maturity reached by sub:fields of machine intelligence

(Machine learning, 7ognitive Dision, Multi:modal monitoring. 4or e&ample, humans can

interact with machine through a gesture recognition device Eii:mote J/K ; 7yber "love

J2K and Multi:touch screen J5K.


14/58

The main concepts related to the topic of gesture recognition from video sequence are6

computer vision, behaviour understanding, people and body part detection, people and

body part tracking, and posture detection.

%.%.% Co0puter ision

Co0puter ision also called #ac$ine ision (when focusing on industrial applications

is the broader research field of gesture recognition. 3n the frontiers of artificial

intelligence, Machine -earning (7ognitive Dision, and !mageIignal 0rocessing, it aims

at developing artificial systems that analyze and understand video streams (i.e. sequence

of images or static images which are not, generally, intended to emulate human vision.

7omputer vision is considered as the cross:road of several research fields6 Mathematics

("eometry, tatistical nalysis, and 3ptimization problems, 0hysics (3ptics, !maging

(smart cameras, #obotics (robot vision, and


15/58

%.%.( People "etection an8 Bo8+ Part "etection

0eople detection and body part detection is concerned with detection of people andIor

their body parts. These middle:level algorithms require often low:level algorithms like

background updating and foreground segmentation. The main challenge of people

detection is to cope with different styles of clothing, various types of posture and

occlusions (partialItotal with static ob1ects or with other people. The three main

categories of people detectors are6 (/ Holistic %etectors, (2 0art:based %etectors and (5

Hybrid %etectors using both global and local cues. !n holistic detection, a global search in

the whole frame is performed. 0eople are detected when the features, considered around alocal window, meet some criteria. "lobal features can be used such as edge template J9K

or also local features like Histogram of oriented "radient (Ho" J;K. 7oncerning part:

based methods, people are considered as collections of parts. 0art hypotheses are

generated by learning local features such as edgelet features J=K, orientation features J>K.

Then the part hypotheses are 1oined to form the best assembly of people hypotheses.


16/58

feature points, motion of body parts or global (e.g. the whole body motion signature.

The main goal is to e&tract people motion features in order to analyze them for gesture

recognition. 3nce the movement of the body or its parts is detected, computations are

made to identify the type of motion which are known as the motion analysis step. This

analysis may be then used by different middle:level algorithms6 ob1ect tracker (when we

deal with ob1ect motion and gesture recognition (when we deal with ob1ect and body part

motion.

%.%.6 Posture "etection

0osture detection can be viewed as a sub:field of gesture recognition since a posture is a

Lstatic gesture. !n practice, posture recognition is usually at the crossroad between

people detection and gesture recognition. ometimes we are only interested in the posture

at a given time which can be performed by a people detector J//K. !n other cases, posture

detection can be sometimes considered as a first step for gesture recognition, for instance,

by associating postures to states of a 4inite tate Machine (4M J/@K. The challenges of

posture recognition are seamlessly the same as gesture recognition e&cept thatthe

temporal aspect is not accounted. -ike the equivalent trade:off in gesture recognition, an

adequate balance between accuracy, precision and processing time is usually difficult to

find.

%.%. Propose8 Algorit$0

12


17/58

4igure /./6 4lowchart of the 0ro1ect pproach

The camera is made to monitor and record the hand developments continually. The

system framework is at first in standby mode. particular triggering is passed to the

system framework to initialize the process. This indication should be some suitably

chosen previously specified gesture. 3n activation, the camera performs image

acquisition for subsequent processing.

efore moving forward with the processing of image we performed movement detection

using the frames of the recorded gesture sequence, to discard intermediate and

inappropriate frames between two successive, legitimate gestures. !n this way 1ust those

edges in which hand is static for at least a certain amount of time is used for image

processing, and all others with motion or blurring are tossed. This time interim is entirely

13


18/58

reliant on the convenience of the user. 7onsidering only the frames of primary

importance i.e. having a frame with good gesture, s kept as required by the system, and

every other gesture is unneeded.

%.& -or2 Obecti@e an8 #oti@ation

"esture recognition from video sequences is one of the most important challenges in

computer vision and behavior understanding since it offers to the machine the ability to

identify, recognize and interpret the human gestures in order to control some devices, to

interact with some human machine interfaces (HM! or to monitor some human activities.

"enerally defined as any meaningful body motion, gestures play a central role in

everyday communication and often convey emotional information about the gesticulating

person. %uring the last decades, researchers have been interested to recognize

automatically human gestures for several applications.

This %issertation pro1ect aims at building a simple pplication based gesture recognition

system, which does not include the comple&ity of rtificial !ntelligent lgorithms yet

gives an accuracy of )*+. Here the $nvironment consist of a physical world in which a

sensor (here a camera interfaces with the software for processing of !mages.

The main challenge of vision:based gesture recognition is to cope with the large variety

of gestures, which can be generated due to change in a number of parameters (e.g. -ight

!ntensity, 4rame sequence speed etc.. #ecognizing gestures involves handling a

considerable number of degrees of freedom (%o4, huge variability of the 2% appearance

depending on the camera view point, different silhouette scales and many resolutions for

the temporal dimension.

%.( Organiation o/ t$e "issertation or2

1*


19/58

C$apter % !n this section, we overview the remaining contents of this dissertation which

is structured into four main chapters. The ne&t chapter presents the state:of:the:art of

human gesture recognition. The proposed method is over:viewed and detailed in chapters

5, 9 and ;. The seven and last chapter consists of a conclusion where a review of the

contributions and an overview of perspectives are presented.

C$apter & recalls the previous work on gesture recognition by presenting an up:to:date

state:of:the:art. fter a brief presentation of the types of gesture, and the technologies

available currently for the recognition of these "estures.

C$apter ( presents a broader view of how the work has been approached with. 4irst

there is a brief discussion on the types of cquisition devices available and then the pre:

processing stages are briefly e&plained.

C$apter ) presents the #esult and 7onclusion of this pro1ect work.

C!APTER > &

15


20/58

LITERATURE REIE-

10


21/58

C!APTER ? &

LITERATURE REIE-

&.% Intro8uction

Human gesture recognition consists of identifying and interpreting automatically human

gestures using a set of sensors (e.g. cameras, gloves. Here we present a -iterature review

of state:of:the:art in human gesture recognition which includes gesture representations,

recognition techniques and applications. efore we proceed with the -iterature of gesture

recognition, it is important to understand the definition and the nature of gesture as seen

by the literature.

&.& "e/inition an8 Nature o/ Gesture

"enerally speaking, we can define a gesture as a body movement. gesture is a non:

vocal communication, used instead of or in combination with a verbal communication,

intended to e&press meaning. "esture constitute a ma1or and an important mean of human

communication. !ndeed, J//K enumerates seven hundred thousand of non:verbal

communication signals which include fifty thousand two hundred facial e&pressions J/2K

and five thousand of hand gestures J/5K. However, the significance of a gesture strongly

differs from a culture to another6 there is no invariable or universal meaning for a gesture

i.e. this implies that the semantic interpretation of a gesture depends strictly on the given

culture. !n addition, a gesture can be dependent on an individual state6 for e&ample, hand

gestures are synchronous and co:e&pressive with speech, glance and facial e&pressions

which reflect the individual mood. ccording to J/9K, when two people engage a

discussion, thirty five per cent of their communiction is verbal and si&ty five per cent is

+:


22/58

non:verbal. . Haptics6 touching as non:verbal communication depends on the conte&t of the

situation, the relationship between communicators and the manner of touch. Touching

is a particular type of gesture6 handshakes, holding hands, kissing (cheek, lips, andhand, high fives, licking, scratching.

"estures can be categorized with respect to different criteria. 4or instance, J/;K

distinguishes five types of gestures6

/. $mblems6 an emblem (or quotable gesture or emblematic gesture is a gesture which

can be directly translated into short verbal communication such as goodbye wave in

order to replace words. These gestures are very culture:specifics.

2. !llustrators6 an illustrator is a gesture that depicts what the communicator is saying

verbally (e.g. emphasis a key:point in the speech, illustrates a throwing action when

pronouncing the words Lhe threw. These gestures are inherent to the

communicators thoughts and speech. lso called gesticulations, they can be

classified into five subcategories as proposed by J/=K6

•

eats6 rhythmic and often repetitive flicks (short and quick of the hand or thefingers.

• %eictic gestures6 pointing gestures which can be concrete (pointing to a real

location, ob1ect or person or abstract (pointing abstract location, period of

time.

+1


23/58

• !conic gesture6 it consist of hand movements that depict a figural

representations or an action (hand moving upward with wiggling fingers to

depict tree climbing.

• Metaphoric gestures6 gestures depicting abstractions.

• 7ohesive gestures6 they are thematically related but temporally separated

gestures due generally to an interruption of the current communicator by

another one.

5. ffect displays6 an affect display is a gesture that conveys emotion or

communicators intentions (e.g. if the communicator is embarrassed. This type of

gesture is less dependent on the culture.

9. #egulators6 a regulator is a gesture that controls interaction (e.g. control turn:taking in

conversation.

;. daptors6 an adaptor is a gesture that enables the release of body tension (e.g. head

shaking, quickly moving ones leg. These gestures are not used intentionally during a

communication or interaction6 they were at one point used for personal convenience

and have turned into a habit though.

"esture can be also conscious (intentional or non:conscious (refle&, adaptors. !n

addition, a gesture can be dynamic or static. !n the latter case, the gesture becomes a

posture. 4inally, we can classify gestures according to the body parts involved in the

gesture6 (/ hand gestures, (2 headIface gestures and (5 body gestures. !n our work we

focus on static gestures of hand.

++


24/58

4igure 2./6 Ta&onomy of "esture 7ategories J95K

&.( Tec$nolog+ a@ailable /or Recognition

There are two main kinds of devices6 (/ contact:based devices and (2 vision:based

devices. Hereafter we discuss the two kinds of devices.

%. Contact base8 "e@ice6 7ontact:based devices are various6 accelerometers, multi:

touch screen, instrumented gloves are, for instance, the main used technologies. ome

devices, like pple c i0hone #, include several detectors6 multi:touch screen and an

accelerometer for instance. 3ther devices use only one detector6 the accelerometers of

the


25/58

• !nertial6 these devices measure the variation of the earth magnetic field in

order to detect the motion. Two types of device are available6 accelerometers

(e.g. Ei4i:mote # and gyroscopes (e.g. !":/*@ #. J/K propose to recognize

gestures with a Ei:4i:controller independently from the target system using

Hidden Markov Models (HMM. The user can learn personalized gestures for

multimodal intuitive media browsing. J/>K and J/)K propose to detect fallings

among normal gestures using accelerometers.

• Haptics6 multi:touch screens become more and more common in our life (e.g.

tablet 07, pple c i0hone # . J5K propose to recognize multi:touch gestural

interactions using HMM.

• Magnetics6 these devices measure the variation of an artificial magnetic field

for motion detection. Cnlike inertial devices, magnetic devices have some

health issues due to the artificial electro:magnetism.

• Cltra:sonic6 motion trackers from this category are composed of three kinds of

device6 (/ sonic emitters that send out the ultrasound, (2 sonic discs that

reflect the ultrasound (wired by the person and (5 multiple sensors that time

the return pulse. The position is computed according to the time of

propagationIreflection and the speed of sound. The orientation is then

triangulated. These devices are not precise and have low resolution but they

are useful for environment that lacks light and has magnetic obstacles or

noises

4igure 2.26 !nstrumented "love equipped with potentiometer and 3ptic 4iberJ99K

+4


26/58

&. ision Base8 Tec$nolog+D Dision:based gesture recognition systems rely on one or

several cameras in order to analyze and interpret the motion from the captured video

sequences. imilarly to contact:devices, vision:based devices are various. 4or

instance, we can distinguish the following sensors6

• !nfrared cameras6 typically used for night vision, the infrared cameras give

generally a brittle view of the human silhouette.

• Traditional monocular cameras6 the most common cameras due to their

cheaper cost. pecific variant can be used such as fish:eyes cameras for wide:

angle vision and time of:flight cameras for depth (distance from the camera

information.

• tereo cameras6 stereovision delivers directly a 5% world information by

embedding the triangulation process.

• 0TN cameras6 0an:Tilt:Noom cameras enable the vision system to focus on

particular details in the captured scene in order to identify more precisely its

nature.

• ody markers6 some vision systems require to place body markers in order to

detect the human body motion. There are two types of marker6 (/ passive

such as reflective markers shining when strobes hit and (2 active such as

markers flashing -$% lights (in sequence. !n such system, each camera,lighting with strobe lights or normal lights, delivers 2% frames with marker

positions form its view. $ventually, a pre:processing step is charged of

interpreting the views and position into 5% space.

&.) A8@antage an8 "isa8@antage o/ bot$ Tec$nologies

oth of the enabling technologies have its pros and cons. 4or instance, contact:devices

require the user cooperation and can be uncomfortable to wear for long time but they are

precise. Dision:based devices do not require user cooperation but they are more difficult

to configure and suffer from occlusion problem on the other hand contact:devices are

more precise e&cept the utrasonics. lso, they generally do not have occlusion problems

+2


27/58

e&cept the magnetic sensors (metal obstacles and ultrasonic sensors (mechanical

obstacles. 7oncerning health issues, we notice that some contact:devices can rise some

problems6 allergy for the mechanical sensor material, cancer risk for magnetic devices.

Table 2./6 7omparison between 7ontact and Dision based device

Criterion Contact


28/58

Here we review only the three most common ones on which the current research is going

on6 (/ particle filtering and condensation algorithm, (2 learning algorithms for statistical

modelling and (5 automata:based approaches (such as 4inite tate Machines (4M.

Particle Filtering an8 Con8ensation Algorit$0 /or Gesture Recognition

The goal of particle filtering, also called equential Monte 7arlo method (M7, is a

probabilistic inference of the ob1ect motion given a sequence of measurements.

!ntroduced by J/*K, condensation (i.e. 7onditional %ensity 0ropagation is an

improvement of particle filtering for visual tracking which has been e&tended to

gesture recognition J2@K and J2/K. The main idea behind condensation is to estimate

the future probability density by sampling from the current density and weighting the

samples by some measures of their likelihood. #ecently, J22K e&tends the latter

method to the two hand motion models. The author describes the state of a particle at

a given time by four parameters6 the integer inde& of the predictive model, the current

position in the model, a scaling factor of amplitude and a time:dimension scale factor.

The three latter parameters are duplicated to take into account the motion of each

hand. The recognition of gesture is done through three filtering stages6 initialization,

prediction and updating. motion model, consisting of the average horizontal and

vertical pro1ections of the ob1ect velocities, is associated with the filtering process inorder to recognize gestures.

Learning Algorit$0s /or Gesture Statistical #o8elling

-earning algorithms are essentially used for feature e&traction based methods. There

are two main variants of learning algorithms6 (/ linear learner and (2 non:linear

learner. The former is suited for linearly separable data and the latter for the other

cases. nother way to categorize learning algorithms is to consider their outcome.

Thus, we distinguish supervised learning (i.e. matching samples to labels,

unsupervised learning (i.e. only sample clusters without labels, semi:supervised

learning (i.e. mi& of labelled and un:labelled data, reinforcement learning (i.e. learns

policies given observations, J25K, transduction (i.e. supervised learning with

+*


29/58

prediction, J29K and learning to learn (i.e. learns his own inductive bias based on

previous e&perience, J2;K J2=K. The choice of the learning algorithm depends mainly

on the chosen gesture representation. 4or e&ample, J2>K propose to recognize static

hand gestures by learning the contour lines 4ourier descriptor of a segmentation

image obtained by mean shift algorithm J2)K. The classification is done by a support

vector machine combined with the minimum enclosing ball (M$ criterion.

Auto0ata?base8 Approac$es

Eith learning algorithms, automata:based methods are the most common approaches

in the literature. 4or instance, 4Ms, HMMs, 0


30/58

4igure 2.56 Tools for "esture #ecognition6 7lustering 8 7lassifying lgorithms J95K

&. Gesture Representation

everal gesture representations and models have been proposed to abstract and model

human body parts motion. Ee distinguish two main categories of method6 (/ 5% model

based methods and (2 appearance based methods. Moreover, we can split the proposed

models in two kinds according to the spatial and temporal aspects of gestures6 (/ posture

automaton models in which the spatial and the temporal aspects are modelled separately

and (2 motion models in which there is a unique spatial:temporal model.

+0


31/58

4igure 2.96 %ifferent #epresentation of "esturesJ95K

%. (" #o8el base8 #et$o8s6 7ontact:based devices are various6 accelerometers, multi:

touch screen, instrumented gloves are, for instance, the main 5% model defines the

5% spatial description of the human body parts. The temporal aspect is generally

handled by an automaton which generally divides the gesture time into 5 phases J/=K6

(/ the preparation or prestroke phase, (2 the nucleus or stroke phase and (5 the

retraction or poststroke phase. $ach phase can be represented as one or several

transition(s between the spatial states of the 5% human model. The main advantage

of 5% model based methods is to recognize gestures by synthesis6 during the

recognition process, one or more camera(s are looking at the real target and then

compute the parameters of the model that matches spatially the real target and then

follows the latter motion (i.e. update the model parameter and check whether it

matches a transition in temporal model. Thus, the gesture recognition is generally

precise (specially the start and the end time of the gesture. However, these methods

tend to be computationally e&pensive unless implemented directly in dedicated

hardware. ome methods (e.g. J55K combined silouhette e&traction with 5% model

pro1ection fitting by finding the target self:orientation. "enerally, three kinds of

model are usually used6

,:


32/58

Te&tured kinematicIvolumetric model6 these models contain very high details of

the human body6 skeleton and skin surface information.

5% geometric model6 these models are less precise than the formers in terms of

skin information but still contain essentially skeleton information.

• 5% skeleton model6 these are the most common 5% models due to their simplicity

and higher adaptability6 The skeleton contains only the information about the

articulations and their 5% degree of freedom (%o4.

&. Appearance?Base8 #et$o8s

7oncerning appearance:based methods, two main sub:categories e&ist6 (/ 2%

static model based methods and (2 motion:based methods. $ach sub:category

contains several variants. 4or instance, the most used 2% models are6

• 7olor:based models6 methods with this kind of model use generally body markers

to track the motion of the body or the body part. 4or e&ample, J59K propose a

method for hand gesture recognition using multi:scale colour features,

hierarchical models and particle filtering.

• ilhouette geometry based models6 such models may include several geometric

properties of the silhouette such as perimeter, conve&ity, surface, compacity,

bounding bo&Iellipse, elongation, rectangularity, centroid and orientation. J5;K

used the geometric properties of the bounding bo& of the hand skin to recognize

hand gestures.

• %eformable gabarit based models6 they are generally based on deformable active

contours (i.e. snake parametrized with motion and their variants J5=K. J5>K used

snakes for the analysis of gestures and actions in technical talks for video

inde&ing.

,1


33/58

C!APTER > (

GESTURE #O"ELLING

,+


34/58

C!APTER ? (

GESTURE #O"ELLING

(.% Intro8uction

The ssociation for 7omputing Machinery defines human:computer interaction as Oa

discipline concerned with the design, evaluation and implementation of interactive

computing systems for human use and with the study of ma1or phenomena surrounding

them.

"esture recognition is an important, yet difficult task. !t is important because it is a

versatile and intuitive way to develop new, more natural and more human:centered forms

of human:machine interaction. Moreover, it is difficult because it involves the solution of

many challenging subtasks such as robust identification of hands and other body parts,

motion modeling, tracking, pattern recognition and classification.

human hand is an articulated ob1ect with 2> bones and ; fingers. $ach of these fingers

consists of three 1oints. Human hand 1oints can be classified as fle&ion, twist, directive or

spherical depending up on the type of movement or possible rotation a&es. !n total human

hand has appro&imately 2> degrees of freedom. s a result, a large number of gestures

can be generated J5)KJ5*K. s tabulated in figure 2./ and different approaches that can be

used for these different kinds of gesture are also tabulated in figure 2.5.

(.& Ac1uisition o/ "ata

The first stage of any vision system is the image acquisition stage. 3nly after the image

has been satisfactorily obtained can the different approaches be successfully applied.

However, if the image has not been acquired satisfactorily then the intended tasks may

not be achievable, even with the aid of some form of image enhancement.

,,


35/58

4igure 5./6 ystem rchitecture

There are a number of input devices for data acquisition.

first choice for a two:dimensional image input device may be a television camera ::

output is a video signal6

• !mage focused onto a photoconductive target.

• Target scanned line by line horizontally by an electron beam

• $lectric current produces as the beam passes over target.

• 7urrent proportional to the intensity of light at each point.

• Tap current to give a video signal.

However this form of device has several disadvantages like6

• -imited #esolution6 finite number of scan lines (about =2; and frame rate (5@ to

=@ framesIsec.

• %istortion6


36/58

• -ess geometric distortion

• More linear Dideo output.

4or a 5:%imensional image a much comple& acquisition devices are used6

• Laser Ranging S+ste0s6 -aser ranging works on the principle that the surface of

the ob1ect reflects laser light back towards a receiver which then measures the

time (or phase difference between transmission and reception in order to

calculate the depth. Most work at long distances and therefore have inadequate

depth resolution.

Methods based on shape from shading employ photometric stereo techniques to produce

depth measurements. Csing a single camera, two or more images are taken of an ob1ect in

a fi&ed position but under different lighting conditions. y studying the changes in brightness over a surface and employing constraints in the orientation of surfaces, certain

depth information may be calculated.

tereoscopy as a technique for measuring range by triangulation to selected locations in a

scene imaged by two cameras already, the primary computational problem of stereoscopy

is to find the correspondence of various points in the two images.

%ata gloves are the devices for perfect data input with high accuracy and high speed. !t

can provide accurate data of 1oint angle, rotation, location etc. for application in different

virtual reality environments. These gloves are commercially available in the market.

These have already been discussed in chapter 2.

7oloured markers attached to the human skin are also used as input technique and hand

localization is done by the colour localization.

-ow cost web cameras as preferred in this pro1ect work can also be used as an !nput

devices.

(.( Gesture #o8elling

fter acquiring the data from input device, the ne&t step involves the modelling of hand

which further includes steps as described in the 4ig 2.

,2


37/58

egmentation is the process of dividing the !nput !mage into regions separated by

boundaries J92K. The segmentation process depends on the type of gesture, if it is

dynamic gesture then the hand gesture need to be located and tracked J92K, if it is static

gesture (posture the input image have to be segmented only.

To locate the hand, generally a bounding bo& is used to specify the depending on the skin

color J95K' and for tracking the hand there are two main approaches'

either the video is divided into frames and each frame have to be processed alone, or

using some tracking information such as shape, skin color using tools like Balman 4ilter

J92K.

!n J92K, hand is segmented using the skin colour, its the easiest possible way as skin

colour is invariant to translation and rotation changes. "aussian Model is the paametric

technique and Histogram based technique is non:parametric technique.

%rawback of skin segmentation is that it is affected by illumination condition changes.

!n J99K segmentation is done using infrared cameras and range info generated by Time:

of:flight (To4 camera, these can detect different skin colours but are affected by change

in temperature.

%ata "loves and 7oloured Markers can also be used for segmentation as they provide

e&act info about orientation and position of palm and fingers.

The color space used in a specific application plays an essential role in the success of

segmentation process, however color spaces are sensitive to lighting changes, for this

reason, researches tend to use chrominance components only and neglect the luminance

components such as #", and H color spaces. There are factors that obstacle the

segmentation process which are6 comple& background, illumination changes, low video

quality.

(.) I0age Pre?Processing

!mage 0re:processing is essential to enhance the image quality for better results.

coloured image consist of three planes of colours #ed:"reen:lue, this could be

,3


38/58

considered as a 5% matri& in which 2 dimensions are for pi&el value and the third

dimension is dedicated to the plane of colour.

The planes of each colour can be accessed separately using the following MT-

command6

>>IImage(Pixel:Pixel:Plane)

>>To Access Red Plane IImage(:,:,1)

>>To Access Green Plane IImage(:,:,2)

>>To Access Blue Plane IImage(:,:,3)

!mages consumes large amount of memory space. To save memory space the images are

first resized using down sampling. The method preferred for this process is bicubic

interpolation, because in this method the yielded output pi&el value is a weighted average

of pi&els in the ad1acent 9:by:9 neighborhood. The following MT- command by

default uses the bicubic interpolation.

>>IImage=imresie(IImage,!2"# 2"#$)%

fter the #esizing of !mage the #egion of !nterest (#3! is selected in the image, in

computer vision and optical character recognition (37#, the #3! describes the borders

of the ob1ect placed under consideration and is a subset in an !mage which contains the

information desired. The following command was used.

>>roi&ol'(IImage(:,:,1),uin(c),uin(r))%

#oipoly creates an interactive tool for selecting the polygon, the mouse is used to identify

the region by selecting vertices of the polygon. This function returns a binary image as

the output which can be used as a mask for mask filtering if required. The functions are

discussed in detail in the documentation of Mathworks J92K.

,*


39/58

4igure 5.26 (a 3riginal !mage' (b "reyscale !mage

(.6 E8ge "etection

$dge %etection is a very important field in image processing and image segmentation.

$dges in digital images are areas with strong intensity contrasts and a 1ump in intensity

from one pi&el to the ne&t can create ma1or variation in the picture quality. 4or those

,5


40/58

reasons edges forms the outline of an ob1ect and also indicates the boundary between

overlapping ob1ects. . These points when 1oined with a line forms the edge of the image'

. Cann+ %'* J9@K. $dge detection is a very important mathematical tool for feature

detection and feature e&traction as discontinuities in image brightness corresponds to

different aspects in image such as discontinuities in depth' discontinuities in surface

orientation' change in material properties and variation in scene lighting.

4igure 5.56 $dge %etection !llustration

!n the above figure consider a subset of pi&el array with their intensity values, now

observing the intensity of the pi&els we can say that there must be an edge e&isting

sandwiched between the 9th and the ;th pi&el, as the difference between the intensity

values of the 9th, ;th pi&el is large. nother importance of edge detecting an image is that

it shrinks the amount of data and filters the unwanted information protecting

simultaneously the important structural properties of the image J9/K.

The gradient J9;KJ9=KJ9>K of the image is one of the fundamental building blocks in

image processing. The first: order derivative of choice in image processing is the

gradient. Mathematically, the gradient of a two:variable function (here the image

intensity function at each image point is a 2% vector J9>KJ9)K with the components given

by the derivatives in the horizontal and vertical directions.everal edge detector operators

J9*KJ;/K are there for generating gradient images like sobel, prewitt, laplacianand

laplacian of "aussian (-o". These edge detectors work better under different conditions

J;@KJ;/K.

,0


41/58

4igure 5.96 $dge detection using predefined operators

$dges in digital images are areas with strong intensity contrasts and a 1ump in intensity

from one pi&el to the ne&t can create ma1or variation in the picture quality. Eith the help

of first: and second: order derivatives such discontinuities are detected. The first: order

derivative of choice in image processing is the gradient. The gradient of a 2:% function,

f(&,y, can be appro&imated as the vector

∇ f =|g x2|+|g y

2|These appro&imations still behave as derivatives' that is, they are zero in areas is contrast

intensity and their values are related to the degree of intensity change in areas of variable

4:


42/58

intensity. !t is common practice to refer the magnitude of the gradient or its

appro&imations simply as Lgradients.

(.6.% Generating Gra8ient I0ages using Pre "e/ine8 Filters

n image gradient is a directional change in the intensity or color in an image. !mage

gradients may be used to e&tract information from images. n e&ample of small image

neighborhood is shown below.

4igure 5.;6 $dge Mask


43/58

Preitt OperatorD 0rewitt operator edge detection masks are the one of the

oldest and best understood methods of detecting edges in images The 0rewitt edge

detector uses the following mask to appro&imate digitally the first derivatives "&

and "y. The following is a prewitt mask used to compute the gradient in the &

(vertical and y (horizontal directions.

4igure 5.>6 0rewitt 3perator neighborhood

The working of these 5?5 mask is quite simple, the mask is slid over an area of

image, it changes that pi&els value and shifts one pi&el right and it continues until

the end of row is reached' it starts again from the beginning of the ne&t row. These

5?5 masks cannot manipulate the first and the last rows as well as the columns

because the mask will move outside the image boundary if placed over a pi&el in

first or the last row and column J>K.

Laplacian o/ Gaussian 9LoG:D This detector finds edges by looking for zero

crossings after filtering f(&, y with a -aplacian of "aussian filter. !n this method,

the "aussian filtering is combined with -aplacian to break down the image where

the intensity varies to detect the edges effectively. !t finds the correct place of

edges and testing wider area around the pi&el. !n below a ;&; standard -aplacian

of "aussian edge detection mask is given

4+


44/58

4igure 5.)6 -aplacian of "aussian;?; mask

(.6.& Approac$ /or Gra8ient

The flowchart of the approach of generating gradient images is given below. t very

beginning a colored image is chosen and processed further using MT- v2@/2b here.

The image is converted into gray scale in the immediate step. gray scale image is

mainly combination of two colors, black and white. !t carries the intensity information

where, black have the low or weakest intensity and white have the high or strongest

intensity. Dariation of this intensity levels forms the edges of ob1ect or ob1ects. !n final

step different edge detection operators are applied to detect the ob1ect boundaries and

gradients.

4,


45/58

4igure 5.*6 "radient "enerating ppproach

(.6.( E8ge Operator use8 in t$e Proect

!n this system design a 5?5 convolution mask was used to appro&imate the edges in our

#egion of !nterest. MT- command Pfspecial( is used to create a two dimensionalfilter Ph of an specified type and returns a correlational kernel PH.

>>* = +s&ecial(gaussian-)%

4igure 5./@6 7onvolution Mask Csed in the 0ro1ect

44


46/58

4igure 5.//6 4inal !mage after 7onvolution with Mask

(. Feature E=traction

4eature vectors of a segmented image can be e&tracted in different ways according to

their application. There are various methods that can be used for feature e&traction. J)K

7reated aspect ratio of bounding bo& parameter as a feature vector. J*K used self:growing

and self:organized neural gas ("3>oalsum(i.2)=sum(sum(B/2(:,:,i)))%

The above MT- command sums up all the ones in the 2% matri& of the image giving

us the appro&imated perimeter of the hand.

42


47/58

C!APTER > )

43


48/58

RESULT < CONCLUSION

4*


49/58

C!APTER ? )

).% RESULT

The !mages 0rocessed are classified into different classes of gestures and then further the

total perimeter of each gesture is saved as database in their respective classes. !n

precedingstep Query !mage is requested from the user and its features are matched

against the image features which are saved in the database.

4ollowing are the results obtained from this work.

The following images shows the successful recognition of query image6

45


50/58

The following shows the unsuccessful cases of #ecognition6

Table 9./6 #esults obtained

GESTURE

VALUE

INPUT

IMAGE

No. of

SUCCESSFUL

CASES

RATE of

RECOGNITION

/ 2@ 2@/@@

2 2@ 2@/@@

5 2@ 2@ /@@

9 2@ /= )@

; 2@ /5 =@

4rom the above table it is observed that the accuracy of the system is )*+, i.e. there are

// cases where there is a mismatch between the image input and the class to which it is

recognized.

40


51/58

).& CONCLUSION

4rom the above discussions we can conclude that, the technique applied for recognition

of the gestures works fine for chosen application, without using any artificial algorithms

for its recognition.

4or 2@ images of each gesture, the /st "esture value had /@@+ successful recognition

cases' the 2nd and 5rd gesture values also had /@@+ recognition cases' the 9 th gesture value

however had 9 mismatches with the 5 rd gesture value therefore recognized with )@+ of

accuracy' similarly the ;th gesture value has > mismatches with the 9th gesture value with

an accuracy of =;+.

4or the increase in the efficiency of the recognition algorithm used above the orientation

of the gesture images should be similar to each other, with a minute difference' another

important point that should be taken care of is that the algorithm is application specific

i.e. the threshold value for classifying the gesture needs to be revised for different

applications.

2:


52/58

BIBLIOGRAP!7

1D chlomer,R T., 0oppinga, ., Henze, 9.

J5K Eebel, ., Beil, F. 8Noellner, M. (2@@), Multi:touch gestural interaction in &5d

using hidden markov models, in PD#T @)6 0roceedings of the 2@@) 7M

symposium on Dirtual reality software and technology, 7M,


53/58

J>K Mikola1czyk, B., chmid, 7. 8Nisserman, . (2@@9, Human detection based on a

probabilistic assembly of robust part detectors, in P7omputer Dision : $77D

2@@9, Dol. 5@2/, pringer erlin I Heidelberg, pp. =*S)2.

J)K -eibe, ., eemann, $. 8chiele, . (2@@;, 0edestrian detection in crowded

scenes, in P!nternational 7onference on 7omputer Dision and 0attern

#ecognition, Dol. /, pp. )>)S));.

J*K oulay, . (2@@>, Human posture recognition for behaviour understanding, 0h%

thesis, Cniversite de


54/58

J/>K , P$valuation of a threshold:based tri:

a&ial accelerometer fall detection algorithm, "ait 8 0osture 2=(2, /*9S/**.

J/*K !sard, M. 8 lake, . (/**=, 7ontour tracking by stochastic propagation of

conditional density, in P$uropean 7onference on 7omputer Dision, Dol. /@=9 of

-ecture .

2,


55/58

J2;K Thrun, . 8 0ratt, -. (/**), -earning to -earn, Bluwer cademic 0ublishers,

K #en, G. 8 Nhang, 4. (2@@*, Hand gesture recognition based on meb:

svm, in P$mbedded oftware and ystems, econd !nternational 7onference on,

Dol. @, !$$$ 7omputer ociety, -os lamitos, 7, C, pp. 599S59*.

J2)K 7heng, G. (/**;, PMean shift, mode seeking, and clustering, !$$$ Trans. 0attern

nalysis and Machine !ntelligence />(), >*@S>**.

J2*K -ee, H.:B. 8 Bim, F. H. (/***, Pn hmm:based threshold model approach for

gesture recognition, !$$$ Trans. 0attern nalysis and Machine !ntelligence

2/(/@, *=/S*>5.

J5@K -u, E.:-. 8 -ittle, F. F. (2@@=a, imultaneous tracking and action recognition

using the pca:hog descriptor, in P7omputer and #obot Dision, 2@@=. The 5rd

7anadian 7onference on, Quebec, 7anada, pp. =S/5.

J5/K Gamato, F., 3hya, F. 8 !shii, B. (/**2, #ecognizing human action in time:

sequential images using hidden markov model, in P!nternational 7onference on

7omputer Dision and 0attern #ecognition, pp. 5>*S5);.

J52K 0inhanez, 7. 8obick, . (/**>, Human action detection using pnf propagation

of temporal constraints, in P!nternational 7onference on 7omputer Dision and

0attern #ecognition, pp. )*)S*@9.

J55K oulay, . (2@@>, Human posture recognition for behaviour understanding, 0h%

thesis, Cniversite de


56/58

J59K retzner, -., -aptev, !. 8-indeberg, T. (2@@2, Hand gesture recognition using

multi:scale colour features, hierarchical models and particle filtering, in

Putomatic 4ace and "esture #ecognition, 2@@2. 0roceedings. 4ifth !$$$

!nternational 7onference on, pp. 9@;S9/@.

J5;K irdal, . 8Hassanpour, #. (2@@), #egion based hand gesture recognition, in

P/=:th !nternational 7onference in 7entral $urope on 7omputer "raphics,

Disualization and 7omputer Dision, pp. / S >.

J5=K -eitner, 4. 87inquin, 0. (/**5, 4rom splines and snakes to snake splines, in

Pelected 0apers from the Eorkshop on "eometric #easoning for 0erception and

ction, pringer:Derlag, -ondon, CB, pp. 2=9S2)/.

J5>K Fu, . ?., lack, M. F., Minneman, . 8 Bimber, %. (/**>, nalysis of gesture

and action in technical talks for video inde&ing, Technical report, merican

ssociation for rtificial !ntelligence. ! Technical #eport :*>:@5.

J5)K Moeslund, T. ., Hilton, . and Bruger, D. urvey of dvances in Dision:ased

Human Motion 7apture and nalysis. 7omputer Dision and !mage

Cnderstanding, /@9(2 (2@@=, *@ S /2=.

J5*K huyan, M. B., *:=*).

J9/K #egina -ionnie, !vanna B. Timotius, !wanetyawan, 2@//. n nalysis of $dge

%etection as a $&tractor in Hand "esture #ecognition ystem based on


57/58

J92K , /@O. 7omputer Dision. Cpper addle

#iver, S /;), 2/;S2/=, 2**S5@@.

J9>K %ubrovin, ..' .T. 4omenko, .0. , /**/.

J9)K Haralick, #. B. ONero:crossings of second directional derivative operatorO. 0!$

0roc. 3n #obot Dision, /*)2.

J9*K 7anny, F. 4. O variational approach to edge detectionO. ubmitted to !

7onference,Eashington, %. 7., eptember, /*)5.

J;@K Marr, %., Hildreth, $. OTheory of edge detectionO. 0roc. #. oc. -ond. , 2@>,/)>:2/>, /*)@. J/9K #osenfeld, ., Thurston, M. O$dge and curve detection for

visual scene analysisO. !$$$ Trans. 7omput., 7:2@, ;=2:;=*, /*)/.

23


58/58

J;/K -. %avis, O survey of edge detection techniquesO, 7omputer "raphics and

!mage 0rocessing, vol 9, no. 5, pp 29):2=@, /*>;.

Date post:	07-Jul-2018
Category:	Documents
Upload:	9971553633
View:	220 times
Download:	0 times

Application specific Gesture Recognition

Documents