Date post: | 07-Jul-2018 |
Category: |
Documents |
Upload: | 9971553633 |
View: | 220 times |
Download: | 0 times |
of 58
8/19/2019 Application specific Gesture Recognition
1/58
Applicationpecifc Hand
Gesture
Recognitionystem
Ankit Mishra
ELECTICAL EGG! "E#T$ GA%TAM &%""HA%I'ERSIT(
A)ITMISHRA*+,-GMAIL!C.M
/0100*122,3,,
8/19/2019 Application specific Gesture Recognition
2/58
ABSTRACT
Hand gesture recognition system can be used for interfacing between
computer and human using hand gesture. This work presents a technique for
a human computer interface through hand gesture recognition, the aim is to
recognize gestures and more generally static hand gestures accomplished by
an individual in a video sequence. Many techniques have already been
proposed in literature for gesture recognition in specific environment (e.g.
laboratory using the cooperation of several sensors (e.g. camera network,
individual equipped with markers.
!n this dissertation, a "esture #ecognition Method based on the $dge
%etection method as feature e&traction is proposed. The proposed algorithm
works on the basic knowledge' as the number of fingers increase for a
gesture, the hand area also increases, and hence it can be used as animportant criteria for #ecognition ystems.
The results of simulation clearly indicate that the recognition performance is
good only for a particular application and it needs to be modified if the
application area changes.
The current accuracy rate of the system is )*+ with the particular intensity
level at which images were taken on MT- $nvironment.
IST OF FIGURES
1
8/19/2019 Application specific Gesture Recognition
3/58
Figure
No.
Caption Page
No.
/./ 0roposed lgorithm for the 0ro1ect
2./ Ta&onomy of "esture categories
2.2 !nstrumented "love equipped with 0otentiometer and 3ptic 4iber
2.5 Tools for "esture #ecognition6 7lustering 8 7lassifying lgorithms
2.9 %ifferent #epresentation of "estures
5./ system rchitecture
5.2 (a 3riginal image' (b converted "reyscale !mage.
5.5 $dge detection !llustration
5.9 $dge detection using pre:defined operators
5.; $dge
8/19/2019 Application specific Gesture Recognition
4/58
5.* "radient "enerating pproach
5./@ 7onvolution mask used in the 0ro1ect
5.// 4inal image after convlouting with the mask
LIST OF TABLES
,
8/19/2019 Application specific Gesture Recognition
5/58
4
Table
No.
Particulars Page
No.
2./ 7omparison between contact device and vision device
9./ #esut
8/19/2019 Application specific Gesture Recognition
6/58
LIST OF ABBREIATIONS
!CI Human 7omputer
L"A -inear %iscriminant nalysis
##I Man:Machine !nteraction
"oF %egree of 4reedom
!OG Histogram of 3riented "radient
IS# !mplicit hape Model
2
8/19/2019 Application specific Gesture Recognition
7/58
CONTENT
7
8/19/2019 Application specific Gesture Recognition
8/58
5./ !ntroduction
5.2 cquisition of %ata
5.5 "esture Modelling
5.9 !mage 0reprocessing
5.; $dge %etection
5.;./ "enerating "radient !mages using 0redefined 4ilters
5.;.2 pproach for "radient
5.;.5 $dge 3perator used in the 0ro1ect
5.= 4eature $&traction
29
29
2>
2)
5@
52
59
5;
5=
C$apter ) #$C-T 73
8/19/2019 Application specific Gesture Recognition
9/58
AC,NO-LE"GE#ENT
! would like to e&press my gratitude and thanks to my supervisor, "r.
S$abana Uroo, ssistant 0rofessor, chool of $ngineering, "autam
uddha Cniversity, for her valuable guidance, and constant support
throughout my pro1ect work which provided the much needed continuity for
its completion.
8/19/2019 Application specific Gesture Recognition
10/58
CAN"I"ATE3S "ECLARATION
! hereby declare that the work embodied in this dissertation report entitled 4Application
Speci/ic Gesture Recognition S+ste05 submitted in the partial fulfilment of therequirements for the award of the degree of 6 7ear Integrate8 "ual "egree
Progra00eB. Tec$. 9Electrical Engineering: ; #. Tec$. 9Instru0entation <
Control:,to the chool of $ngineering, "autam uddha Cniversity, "reater
8/19/2019 Application specific Gesture Recognition
11/58
"edicated to
My #arents 6or their
Immense Moral Support
To 7hom it may concern8
)eep reaching 6or that Rain9o788
1:
8/19/2019 Application specific Gesture Recognition
12/58
C!APTER > %
INTRO"UCTION
11
8/19/2019 Application specific Gesture Recognition
13/58
C!APTER ? %
INTRO"UCTION
%.% General
"esture was the first mode of communication for the primitive cave men and nowadays
gesture recognition has been a prominent domain of research. "estures are an important
form of human interaction and communication6 hands are usually used to interact with
things (pick up, move and our body gesticulates to communicate with others (no, yes,
stop. Thus, a wide range of gesture recognition applications has been e&perienced up tonow thanks to a certain level of maturity reached by sub:fields of machine intelligence
(Machine learning, 7ognitive Dision, Multi:modal monitoring. 4or e&le, humans can
interact with machine through a gesture recognition device Eii:mote J/K ; 7yber "love
J2K and Multi:touch screen J5K.
8/19/2019 Application specific Gesture Recognition
14/58
The main concepts related to the topic of gesture recognition from video sequence are6
computer vision, behaviour understanding, people and body part detection, people and
body part tracking, and posture detection.
%.%.% Co0puter ision
Co0puter ision also called #ac$ine ision (when focusing on industrial applications
is the broader research field of gesture recognition. 3n the frontiers of artificial
intelligence, Machine -earning (7ognitive Dision, and !mageIignal 0rocessing, it aims
at developing artificial systems that analyze and understand video streams (i.e. sequence
of images or static images which are not, generally, intended to emulate human vision.
7omputer vision is considered as the cross:road of several research fields6 Mathematics
("eometry, tatistical nalysis, and 3ptimization problems, 0hysics (3ptics, !maging
(smart cameras, #obotics (robot vision, and
8/19/2019 Application specific Gesture Recognition
15/58
%.%.( People "etection an8 Bo8+ Part "etection
0eople detection and body part detection is concerned with detection of people andIor
their body parts. These middle:level algorithms require often low:level algorithms like
background updating and foreground segmentation. The main challenge of people
detection is to cope with different styles of clothing, various types of posture and
occlusions (partialItotal with static ob1ects or with other people. The three main
categories of people detectors are6 (/ Holistic %etectors, (2 0art:based %etectors and (5
Hybrid %etectors using both global and local cues. !n holistic detection, a global search in
the whole frame is performed. 0eople are detected when the features, considered around alocal window, meet some criteria. "lobal features can be used such as edge template J9K
or also local features like Histogram of oriented "radient (Ho" J;K. 7oncerning part:
based methods, people are considered as collections of parts. 0art hypotheses are
generated by learning local features such as edgelet features J=K, orientation features J>K.
Then the part hypotheses are 1oined to form the best assembly of people hypotheses.
8/19/2019 Application specific Gesture Recognition
16/58
feature points, motion of body parts or global (e.g. the whole body motion signature.
The main goal is to e&tract people motion features in order to analyze them for gesture
recognition. 3nce the movement of the body or its parts is detected, computations are
made to identify the type of motion which are known as the motion analysis step. This
analysis may be then used by different middle:level algorithms6 ob1ect tracker (when we
deal with ob1ect motion and gesture recognition (when we deal with ob1ect and body part
motion.
%.%.6 Posture "etection
0osture detection can be viewed as a sub:field of gesture recognition since a posture is a
Lstatic gesture. !n practice, posture recognition is usually at the crossroad between
people detection and gesture recognition. ometimes we are only interested in the posture
at a given time which can be performed by a people detector J//K. !n other cases, posture
detection can be sometimes considered as a first step for gesture recognition, for instance,
by associating postures to states of a 4inite tate Machine (4M J/@K. The challenges of
posture recognition are seamlessly the same as gesture recognition e&cept thatthe
temporal aspect is not accounted. -ike the equivalent trade:off in gesture recognition, an
adequate balance between accuracy, precision and processing time is usually difficult to
find.
%.%. Propose8 Algorit$0
12
8/19/2019 Application specific Gesture Recognition
17/58
4igure /./6 4lowchart of the 0ro1ect pproach
The camera is made to monitor and record the hand developments continually. The
system framework is at first in standby mode. particular triggering is passed to the
system framework to initialize the process. This indication should be some suitably
chosen previously specified gesture. 3n activation, the camera performs image
acquisition for subsequent processing.
efore moving forward with the processing of image we performed movement detection
using the frames of the recorded gesture sequence, to discard intermediate and
inappropriate frames between two successive, legitimate gestures. !n this way 1ust those
edges in which hand is static for at least a certain amount of time is used for image
processing, and all others with motion or blurring are tossed. This time interim is entirely
13
8/19/2019 Application specific Gesture Recognition
18/58
reliant on the convenience of the user. 7onsidering only the frames of primary
importance i.e. having a frame with good gesture, s kept as required by the system, and
every other gesture is unneeded.
%.& -or2 Obecti@e an8 #oti@ation
"esture recognition from video sequences is one of the most important challenges in
computer vision and behavior understanding since it offers to the machine the ability to
identify, recognize and interpret the human gestures in order to control some devices, to
interact with some human machine interfaces (HM! or to monitor some human activities.
"enerally defined as any meaningful body motion, gestures play a central role in
everyday communication and often convey emotional information about the gesticulating
person. %uring the last decades, researchers have been interested to recognize
automatically human gestures for several applications.
This %issertation pro1ect aims at building a simple pplication based gesture recognition
system, which does not include the comple&ity of rtificial !ntelligent lgorithms yet
gives an accuracy of )*+. Here the $nvironment consist of a physical world in which a
sensor (here a camera interfaces with the software for processing of !mages.
The main challenge of vision:based gesture recognition is to cope with the large variety
of gestures, which can be generated due to change in a number of parameters (e.g. -ight
!ntensity, 4rame sequence speed etc.. #ecognizing gestures involves handling a
considerable number of degrees of freedom (%o4, huge variability of the 2% appearance
depending on the camera view point, different silhouette scales and many resolutions for
the temporal dimension.
%.( Organiation o/ t$e "issertation or2
1*
8/19/2019 Application specific Gesture Recognition
19/58
C$apter % !n this section, we overview the remaining contents of this dissertation which
is structured into four main chapters. The ne&t chapter presents the state:of:the:art of
human gesture recognition. The proposed method is over:viewed and detailed in chapters
5, 9 and ;. The seven and last chapter consists of a conclusion where a review of the
contributions and an overview of perspectives are presented.
C$apter & recalls the previous work on gesture recognition by presenting an up:to:date
state:of:the:art. fter a brief presentation of the types of gesture, and the technologies
available currently for the recognition of these "estures.
C$apter ( presents a broader view of how the work has been approached with. 4irst
there is a brief discussion on the types of cquisition devices available and then the pre:
processing stages are briefly e&plained.
C$apter ) presents the #esult and 7onclusion of this pro1ect work.
C!APTER > &
15
8/19/2019 Application specific Gesture Recognition
20/58
LITERATURE REIE-
10
8/19/2019 Application specific Gesture Recognition
21/58
C!APTER ? &
LITERATURE REIE-
&.% Intro8uction
Human gesture recognition consists of identifying and interpreting automatically human
gestures using a set of sensors (e.g. cameras, gloves. Here we present a -iterature review
of state:of:the:art in human gesture recognition which includes gesture representations,
recognition techniques and applications. efore we proceed with the -iterature of gesture
recognition, it is important to understand the definition and the nature of gesture as seen
by the literature.
&.& "e/inition an8 Nature o/ Gesture
"enerally speaking, we can define a gesture as a body movement. gesture is a non:
vocal communication, used instead of or in combination with a verbal communication,
intended to e&press meaning. "esture constitute a ma1or and an important mean of human
communication. !ndeed, J//K enumerates seven hundred thousand of non:verbal
communication signals which include fifty thousand two hundred facial e&pressions J/2K
and five thousand of hand gestures J/5K. However, the significance of a gesture strongly
differs from a culture to another6 there is no invariable or universal meaning for a gesture
i.e. this implies that the semantic interpretation of a gesture depends strictly on the given
culture. !n addition, a gesture can be dependent on an individual state6 for e&le, hand
gestures are synchronous and co:e&pressive with speech, glance and facial e&pressions
which reflect the individual mood. ccording to J/9K, when two people engage a
discussion, thirty five per cent of their communiction is verbal and si&ty five per cent is
+:
8/19/2019 Application specific Gesture Recognition
22/58
non:verbal. . Haptics6 touching as non:verbal communication depends on the conte&t of the
situation, the relationship between communicators and the manner of touch. Touching
is a particular type of gesture6 handshakes, holding hands, kissing (cheek, lips, andhand, high fives, licking, scratching.
"estures can be categorized with respect to different criteria. 4or instance, J/;K
distinguishes five types of gestures6
/. $mblems6 an emblem (or quotable gesture or emblematic gesture is a gesture which
can be directly translated into short verbal communication such as goodbye wave in
order to replace words. These gestures are very culture:specifics.
2. !llustrators6 an illustrator is a gesture that depicts what the communicator is saying
verbally (e.g. emphasis a key:point in the speech, illustrates a throwing action when
pronouncing the words Lhe threw. These gestures are inherent to the
communicators thoughts and speech. lso called gesticulations, they can be
classified into five subcategories as proposed by J/=K6
•
eats6 rhythmic and often repetitive flicks (short and quick of the hand or thefingers.
• %eictic gestures6 pointing gestures which can be concrete (pointing to a real
location, ob1ect or person or abstract (pointing abstract location, period of
time.
+1
8/19/2019 Application specific Gesture Recognition
23/58
• !conic gesture6 it consist of hand movements that depict a figural
representations or an action (hand moving upward with wiggling fingers to
depict tree climbing.
• Metaphoric gestures6 gestures depicting abstractions.
• 7ohesive gestures6 they are thematically related but temporally separated
gestures due generally to an interruption of the current communicator by
another one.
5. ffect displays6 an affect display is a gesture that conveys emotion or
communicators intentions (e.g. if the communicator is embarrassed. This type of
gesture is less dependent on the culture.
9. #egulators6 a regulator is a gesture that controls interaction (e.g. control turn:taking in
conversation.
;. daptors6 an adaptor is a gesture that enables the release of body tension (e.g. head
shaking, quickly moving ones leg. These gestures are not used intentionally during a
communication or interaction6 they were at one point used for personal convenience
and have turned into a habit though.
"esture can be also conscious (intentional or non:conscious (refle&, adaptors. !n
addition, a gesture can be dynamic or static. !n the latter case, the gesture becomes a
posture. 4inally, we can classify gestures according to the body parts involved in the
gesture6 (/ hand gestures, (2 headIface gestures and (5 body gestures. !n our work we
focus on static gestures of hand.
++
8/19/2019 Application specific Gesture Recognition
24/58
4igure 2./6 Ta&onomy of "esture 7ategories J95K
&.( Tec$nolog+ a@ailable /or Recognition
There are two main kinds of devices6 (/ contact:based devices and (2 vision:based
devices. Hereafter we discuss the two kinds of devices.
%. Contact base8 "e@ice6 7ontact:based devices are various6 accelerometers, multi:
touch screen, instrumented gloves are, for instance, the main used technologies. ome
devices, like pple c i0hone #, include several detectors6 multi:touch screen and an
accelerometer for instance. 3ther devices use only one detector6 the accelerometers of
the
8/19/2019 Application specific Gesture Recognition
25/58
• !nertial6 these devices measure the variation of the earth magnetic field in
order to detect the motion. Two types of device are available6 accelerometers
(e.g. Ei4i:mote # and gyroscopes (e.g. !":/*@ #. J/K propose to recognize
gestures with a Ei:4i:controller independently from the target system using
Hidden Markov Models (HMM. The user can learn personalized gestures for
multimodal intuitive media browsing. J/>K and J/)K propose to detect fallings
among normal gestures using accelerometers.
• Haptics6 multi:touch screens become more and more common in our life (e.g.
tablet 07, pple c i0hone # . J5K propose to recognize multi:touch gestural
interactions using HMM.
• Magnetics6 these devices measure the variation of an artificial magnetic field
for motion detection. Cnlike inertial devices, magnetic devices have some
health issues due to the artificial electro:magnetism.
• Cltra:sonic6 motion trackers from this category are composed of three kinds of
device6 (/ sonic emitters that send out the ultrasound, (2 sonic discs that
reflect the ultrasound (wired by the person and (5 multiple sensors that time
the return pulse. The position is computed according to the time of
propagationIreflection and the speed of sound. The orientation is then
triangulated. These devices are not precise and have low resolution but they
are useful for environment that lacks light and has magnetic obstacles or
noises
4igure 2.26 !nstrumented "love equipped with potentiometer and 3ptic 4iberJ99K
+4
8/19/2019 Application specific Gesture Recognition
26/58
&. ision Base8 Tec$nolog+D Dision:based gesture recognition systems rely on one or
several cameras in order to analyze and interpret the motion from the captured video
sequences. imilarly to contact:devices, vision:based devices are various. 4or
instance, we can distinguish the following sensors6
• !nfrared cameras6 typically used for night vision, the infrared cameras give
generally a brittle view of the human silhouette.
• Traditional monocular cameras6 the most common cameras due to their
cheaper cost. pecific variant can be used such as fish:eyes cameras for wide:
angle vision and time of:flight cameras for depth (distance from the camera
information.
• tereo cameras6 stereovision delivers directly a 5% world information by
embedding the triangulation process.
• 0TN cameras6 0an:Tilt:Noom cameras enable the vision system to focus on
particular details in the captured scene in order to identify more precisely its
nature.
• ody markers6 some vision systems require to place body markers in order to
detect the human body motion. There are two types of marker6 (/ passive
such as reflective markers shining when strobes hit and (2 active such as
markers flashing -$% lights (in sequence. !n such system, each camera,lighting with strobe lights or normal lights, delivers 2% frames with marker
positions form its view. $ventually, a pre:processing step is charged of
interpreting the views and position into 5% space.
&.) A8@antage an8 "isa8@antage o/ bot$ Tec$nologies
oth of the enabling technologies have its pros and cons. 4or instance, contact:devices
require the user cooperation and can be uncomfortable to wear for long time but they are
precise. Dision:based devices do not require user cooperation but they are more difficult
to configure and suffer from occlusion problem on the other hand contact:devices are
more precise e&cept the utrasonics. lso, they generally do not have occlusion problems
+2
8/19/2019 Application specific Gesture Recognition
27/58
e&cept the magnetic sensors (metal obstacles and ultrasonic sensors (mechanical
obstacles. 7oncerning health issues, we notice that some contact:devices can rise some
problems6 allergy for the mechanical sensor material, cancer risk for magnetic devices.
Table 2./6 7omparison between 7ontact and Dision based device
Criterion Contact
8/19/2019 Application specific Gesture Recognition
28/58
Here we review only the three most common ones on which the current research is going
on6 (/ particle filtering and condensation algorithm, (2 learning algorithms for statistical
modelling and (5 automata:based approaches (such as 4inite tate Machines (4M.
Particle Filtering an8 Con8ensation Algorit$0 /or Gesture Recognition
The goal of particle filtering, also called equential Monte 7arlo method (M7, is a
probabilistic inference of the ob1ect motion given a sequence of measurements.
!ntroduced by J/*K, condensation (i.e. 7onditional %ensity 0ropagation is an
improvement of particle filtering for visual tracking which has been e&tended to
gesture recognition J2@K and J2/K. The main idea behind condensation is to estimate
the future probability density by sampling from the current density and weighting the
samples by some measures of their likelihood. #ecently, J22K e&tends the latter
method to the two hand motion models. The author describes the state of a particle at
a given time by four parameters6 the integer inde& of the predictive model, the current
position in the model, a scaling factor of amplitude and a time:dimension scale factor.
The three latter parameters are duplicated to take into account the motion of each
hand. The recognition of gesture is done through three filtering stages6 initialization,
prediction and updating. motion model, consisting of the average horizontal and
vertical pro1ections of the ob1ect velocities, is associated with the filtering process inorder to recognize gestures.
Learning Algorit$0s /or Gesture Statistical #o8elling
-earning algorithms are essentially used for feature e&traction based methods. There
are two main variants of learning algorithms6 (/ linear learner and (2 non:linear
learner. The former is suited for linearly separable data and the latter for the other
cases. nother way to categorize learning algorithms is to consider their outcome.
Thus, we distinguish supervised learning (i.e. matching samples to labels,
unsupervised learning (i.e. only sample clusters without labels, semi:supervised
learning (i.e. mi& of labelled and un:labelled data, reinforcement learning (i.e. learns
policies given observations, J25K, transduction (i.e. supervised learning with
+*
8/19/2019 Application specific Gesture Recognition
29/58
prediction, J29K and learning to learn (i.e. learns his own inductive bias based on
previous e&perience, J2;K J2=K. The choice of the learning algorithm depends mainly
on the chosen gesture representation. 4or e&le, J2>K propose to recognize static
hand gestures by learning the contour lines 4ourier descriptor of a segmentation
image obtained by mean shift algorithm J2)K. The classification is done by a support
vector machine combined with the minimum enclosing ball (M$ criterion.
Auto0ata?base8 Approac$es
Eith learning algorithms, automata:based methods are the most common approaches
in the literature. 4or instance, 4Ms, HMMs, 0
8/19/2019 Application specific Gesture Recognition
30/58
4igure 2.56 Tools for "esture #ecognition6 7lustering 8 7lassifying lgorithms J95K
&. Gesture Representation
everal gesture representations and models have been proposed to abstract and model
human body parts motion. Ee distinguish two main categories of method6 (/ 5% model
based methods and (2 appearance based methods. Moreover, we can split the proposed
models in two kinds according to the spatial and temporal aspects of gestures6 (/ posture
automaton models in which the spatial and the temporal aspects are modelled separately
and (2 motion models in which there is a unique spatial:temporal model.
+0
8/19/2019 Application specific Gesture Recognition
31/58
4igure 2.96 %ifferent #epresentation of "esturesJ95K
%. (" #o8el base8 #et$o8s6 7ontact:based devices are various6 accelerometers, multi:
touch screen, instrumented gloves are, for instance, the main 5% model defines the
5% spatial description of the human body parts. The temporal aspect is generally
handled by an automaton which generally divides the gesture time into 5 phases J/=K6
(/ the preparation or prestroke phase, (2 the nucleus or stroke phase and (5 the
retraction or poststroke phase. $ach phase can be represented as one or several
transition(s between the spatial states of the 5% human model. The main advantage
of 5% model based methods is to recognize gestures by synthesis6 during the
recognition process, one or more camera(s are looking at the real target and then
compute the parameters of the model that matches spatially the real target and then
follows the latter motion (i.e. update the model parameter and check whether it
matches a transition in temporal model. Thus, the gesture recognition is generally
precise (specially the start and the end time of the gesture. However, these methods
tend to be computationally e&pensive unless implemented directly in dedicated
hardware. ome methods (e.g. J55K combined silouhette e&traction with 5% model
pro1ection fitting by finding the target self:orientation. "enerally, three kinds of
model are usually used6
,:
8/19/2019 Application specific Gesture Recognition
32/58
Te&tured kinematicIvolumetric model6 these models contain very high details of
the human body6 skeleton and skin surface information.
5% geometric model6 these models are less precise than the formers in terms of
skin information but still contain essentially skeleton information.
• 5% skeleton model6 these are the most common 5% models due to their simplicity
and higher adaptability6 The skeleton contains only the information about the
articulations and their 5% degree of freedom (%o4.
&. Appearance?Base8 #et$o8s
7oncerning appearance:based methods, two main sub:categories e&ist6 (/ 2%
static model based methods and (2 motion:based methods. $ach sub:category
contains several variants. 4or instance, the most used 2% models are6
• 7olor:based models6 methods with this kind of model use generally body markers
to track the motion of the body or the body part. 4or e&le, J59K propose a
method for hand gesture recognition using multi:scale colour features,
hierarchical models and particle filtering.
• ilhouette geometry based models6 such models may include several geometric
properties of the silhouette such as perimeter, conve&ity, surface, compacity,
bounding bo&Iellipse, elongation, rectangularity, centroid and orientation. J5;K
used the geometric properties of the bounding bo& of the hand skin to recognize
hand gestures.
• %eformable gabarit based models6 they are generally based on deformable active
contours (i.e. snake parametrized with motion and their variants J5=K. J5>K used
snakes for the analysis of gestures and actions in technical talks for video
inde&ing.
,1
8/19/2019 Application specific Gesture Recognition
33/58
C!APTER > (
GESTURE #O"ELLING
,+
8/19/2019 Application specific Gesture Recognition
34/58
C!APTER ? (
GESTURE #O"ELLING
(.% Intro8uction
The ssociation for 7omputing Machinery defines human:computer interaction as Oa
discipline concerned with the design, evaluation and implementation of interactive
computing systems for human use and with the study of ma1or phenomena surrounding
them.
"esture recognition is an important, yet difficult task. !t is important because it is a
versatile and intuitive way to develop new, more natural and more human:centered forms
of human:machine interaction. Moreover, it is difficult because it involves the solution of
many challenging subtasks such as robust identification of hands and other body parts,
motion modeling, tracking, pattern recognition and classification.
human hand is an articulated ob1ect with 2> bones and ; fingers. $ach of these fingers
consists of three 1oints. Human hand 1oints can be classified as fle&ion, twist, directive or
spherical depending up on the type of movement or possible rotation a&es. !n total human
hand has appro&imately 2> degrees of freedom. s a result, a large number of gestures
can be generated J5)KJ5*K. s tabulated in figure 2./ and different approaches that can be
used for these different kinds of gesture are also tabulated in figure 2.5.
(.& Ac1uisition o/ "ata
The first stage of any vision system is the image acquisition stage. 3nly after the image
has been satisfactorily obtained can the different approaches be successfully applied.
However, if the image has not been acquired satisfactorily then the intended tasks may
not be achievable, even with the aid of some form of image enhancement.
,,
8/19/2019 Application specific Gesture Recognition
35/58
4igure 5./6 ystem rchitecture
There are a number of input devices for data acquisition.
first choice for a two:dimensional image input device may be a television camera ::
output is a video signal6
• !mage focused onto a photoconductive target.
• Target scanned line by line horizontally by an electron beam
• $lectric current produces as the beam passes over target.
• 7urrent proportional to the intensity of light at each point.
• Tap current to give a video signal.
However this form of device has several disadvantages like6
• -imited #esolution6 finite number of scan lines (about =2; and frame rate (5@ to
=@ framesIsec.
• %istortion6
8/19/2019 Application specific Gesture Recognition
36/58
• -ess geometric distortion
• More linear Dideo output.
4or a 5:%imensional image a much comple& acquisition devices are used6
• Laser Ranging S+ste0s6 -aser ranging works on the principle that the surface of
the ob1ect reflects laser light back towards a receiver which then measures the
time (or phase difference between transmission and reception in order to
calculate the depth. Most work at long distances and therefore have inadequate
depth resolution.
Methods based on shape from shading employ photometric stereo techniques to produce
depth measurements. Csing a single camera, two or more images are taken of an ob1ect in
a fi&ed position but under different lighting conditions. y studying the changes in brightness over a surface and employing constraints in the orientation of surfaces, certain
depth information may be calculated.
tereoscopy as a technique for measuring range by triangulation to selected locations in a
scene imaged by two cameras already, the primary computational problem of stereoscopy
is to find the correspondence of various points in the two images.
%ata gloves are the devices for perfect data input with high accuracy and high speed. !t
can provide accurate data of 1oint angle, rotation, location etc. for application in different
virtual reality environments. These gloves are commercially available in the market.
These have already been discussed in chapter 2.
7oloured markers attached to the human skin are also used as input technique and hand
localization is done by the colour localization.
-ow cost web cameras as preferred in this pro1ect work can also be used as an !nput
devices.
(.( Gesture #o8elling
fter acquiring the data from input device, the ne&t step involves the modelling of hand
which further includes steps as described in the 4ig 2.
,2
8/19/2019 Application specific Gesture Recognition
37/58
egmentation is the process of dividing the !nput !mage into regions separated by
boundaries J92K. The segmentation process depends on the type of gesture, if it is
dynamic gesture then the hand gesture need to be located and tracked J92K, if it is static
gesture (posture the input image have to be segmented only.
To locate the hand, generally a bounding bo& is used to specify the depending on the skin
color J95K' and for tracking the hand there are two main approaches'
either the video is divided into frames and each frame have to be processed alone, or
using some tracking information such as shape, skin color using tools like Balman 4ilter
J92K.
!n J92K, hand is segmented using the skin colour, its the easiest possible way as skin
colour is invariant to translation and rotation changes. "aussian Model is the paametric
technique and Histogram based technique is non:parametric technique.
%rawback of skin segmentation is that it is affected by illumination condition changes.
!n J99K segmentation is done using infrared cameras and range info generated by Time:
of:flight (To4 camera, these can detect different skin colours but are affected by change
in temperature.
%ata "loves and 7oloured Markers can also be used for segmentation as they provide
e&act info about orientation and position of palm and fingers.
The color space used in a specific application plays an essential role in the success of
segmentation process, however color spaces are sensitive to lighting changes, for this
reason, researches tend to use chrominance components only and neglect the luminance
components such as #", and H color spaces. There are factors that obstacle the
segmentation process which are6 comple& background, illumination changes, low video
quality.
(.) I0age Pre?Processing
!mage 0re:processing is essential to enhance the image quality for better results.
coloured image consist of three planes of colours #ed:"reen:lue, this could be
,3
8/19/2019 Application specific Gesture Recognition
38/58
considered as a 5% matri& in which 2 dimensions are for pi&el value and the third
dimension is dedicated to the plane of colour.
The planes of each colour can be accessed separately using the following MT-
command6
>>IImage(Pixel:Pixel:Plane)
>>To Access Red Plane IImage(:,:,1)
>>To Access Green Plane IImage(:,:,2)
>>To Access Blue Plane IImage(:,:,3)
!mages consumes large amount of memory space. To save memory space the images are
first resized using down sampling. The method preferred for this process is bicubic
interpolation, because in this method the yielded output pi&el value is a weighted average
of pi&els in the ad1acent 9:by:9 neighborhood. The following MT- command by
default uses the bicubic interpolation.
>>IImage=imresie(IImage,!2"# 2"#$)%
fter the #esizing of !mage the #egion of !nterest (#3! is selected in the image, in
computer vision and optical character recognition (37#, the #3! describes the borders
of the ob1ect placed under consideration and is a subset in an !mage which contains the
information desired. The following command was used.
>>roi&ol'(IImage(:,:,1),uin(c),uin(r))%
#oipoly creates an interactive tool for selecting the polygon, the mouse is used to identify
the region by selecting vertices of the polygon. This function returns a binary image as
the output which can be used as a mask for mask filtering if required. The functions are
discussed in detail in the documentation of Mathworks J92K.
,*
8/19/2019 Application specific Gesture Recognition
39/58
4igure 5.26 (a 3riginal !mage' (b "reyscale !mage
(.6 E8ge "etection
$dge %etection is a very important field in image processing and image segmentation.
$dges in digital images are areas with strong intensity contrasts and a 1ump in intensity
from one pi&el to the ne&t can create ma1or variation in the picture quality. 4or those
,5
8/19/2019 Application specific Gesture Recognition
40/58
reasons edges forms the outline of an ob1ect and also indicates the boundary between
overlapping ob1ects. . These points when 1oined with a line forms the edge of the image'
. Cann+ %'* J9@K. $dge detection is a very important mathematical tool for feature
detection and feature e&traction as discontinuities in image brightness corresponds to
different aspects in image such as discontinuities in depth' discontinuities in surface
orientation' change in material properties and variation in scene lighting.
4igure 5.56 $dge %etection !llustration
!n the above figure consider a subset of pi&el array with their intensity values, now
observing the intensity of the pi&els we can say that there must be an edge e&isting
sandwiched between the 9th and the ;th pi&el, as the difference between the intensity
values of the 9th, ;th pi&el is large. nother importance of edge detecting an image is that
it shrinks the amount of data and filters the unwanted information protecting
simultaneously the important structural properties of the image J9/K.
The gradient J9;KJ9=KJ9>K of the image is one of the fundamental building blocks in
image processing. The first: order derivative of choice in image processing is the
gradient. Mathematically, the gradient of a two:variable function (here the image
intensity function at each image point is a 2% vector J9>KJ9)K with the components given
by the derivatives in the horizontal and vertical directions.everal edge detector operators
J9*KJ;/K are there for generating gradient images like sobel, prewitt, laplacianand
laplacian of "aussian (-o". These edge detectors work better under different conditions
J;@KJ;/K.
,0
8/19/2019 Application specific Gesture Recognition
41/58
4igure 5.96 $dge detection using predefined operators
$dges in digital images are areas with strong intensity contrasts and a 1ump in intensity
from one pi&el to the ne&t can create ma1or variation in the picture quality. Eith the help
of first: and second: order derivatives such discontinuities are detected. The first: order
derivative of choice in image processing is the gradient. The gradient of a 2:% function,
f(&,y, can be appro&imated as the vector
∇ f =|g x2|+|g y
2|These appro&imations still behave as derivatives' that is, they are zero in areas is contrast
intensity and their values are related to the degree of intensity change in areas of variable
4:
8/19/2019 Application specific Gesture Recognition
42/58
intensity. !t is common practice to refer the magnitude of the gradient or its
appro&imations simply as Lgradients.
(.6.% Generating Gra8ient I0ages using Pre "e/ine8 Filters
n image gradient is a directional change in the intensity or color in an image. !mage
gradients may be used to e&tract information from images. n e&le of small image
neighborhood is shown below.
4igure 5.;6 $dge Mask
8/19/2019 Application specific Gesture Recognition
43/58
Preitt OperatorD 0rewitt operator edge detection masks are the one of the
oldest and best understood methods of detecting edges in images The 0rewitt edge
detector uses the following mask to appro&imate digitally the first derivatives "&
and "y. The following is a prewitt mask used to compute the gradient in the &
(vertical and y (horizontal directions.
4igure 5.>6 0rewitt 3perator neighborhood
The working of these 5?5 mask is quite simple, the mask is slid over an area of
image, it changes that pi&els value and shifts one pi&el right and it continues until
the end of row is reached' it starts again from the beginning of the ne&t row. These
5?5 masks cannot manipulate the first and the last rows as well as the columns
because the mask will move outside the image boundary if placed over a pi&el in
first or the last row and column J>K.
Laplacian o/ Gaussian 9LoG:D This detector finds edges by looking for zero
crossings after filtering f(&, y with a -aplacian of "aussian filter. !n this method,
the "aussian filtering is combined with -aplacian to break down the image where
the intensity varies to detect the edges effectively. !t finds the correct place of
edges and testing wider area around the pi&el. !n below a ;&; standard -aplacian
of "aussian edge detection mask is given
4+
8/19/2019 Application specific Gesture Recognition
44/58
4igure 5.)6 -aplacian of "aussian;?; mask
(.6.& Approac$ /or Gra8ient
The flowchart of the approach of generating gradient images is given below. t very
beginning a colored image is chosen and processed further using MT- v2@/2b here.
The image is converted into gray scale in the immediate step. gray scale image is
mainly combination of two colors, black and white. !t carries the intensity information
where, black have the low or weakest intensity and white have the high or strongest
intensity. Dariation of this intensity levels forms the edges of ob1ect or ob1ects. !n final
step different edge detection operators are applied to detect the ob1ect boundaries and
gradients.
4,
8/19/2019 Application specific Gesture Recognition
45/58
4igure 5.*6 "radient "enerating ppproach
(.6.( E8ge Operator use8 in t$e Proect
!n this system design a 5?5 convolution mask was used to appro&imate the edges in our
#egion of !nterest. MT- command Pfspecial( is used to create a two dimensionalfilter Ph of an specified type and returns a correlational kernel PH.
>>* = +s&ecial(gaussian-)%
4igure 5./@6 7onvolution Mask Csed in the 0ro1ect
44
8/19/2019 Application specific Gesture Recognition
46/58
4igure 5.//6 4inal !mage after 7onvolution with Mask
(. Feature E=traction
4eature vectors of a segmented image can be e&tracted in different ways according to
their application. There are various methods that can be used for feature e&traction. J)K
7reated aspect ratio of bounding bo& parameter as a feature vector. J*K used self:growing
and self:organized neural gas ("3>oalsum(i.2)=sum(sum(B/2(:,:,i)))%
The above MT- command sums up all the ones in the 2% matri& of the image giving
us the appro&imated perimeter of the hand.
42
8/19/2019 Application specific Gesture Recognition
47/58
C!APTER > )
43
8/19/2019 Application specific Gesture Recognition
48/58
RESULT < CONCLUSION
4*
8/19/2019 Application specific Gesture Recognition
49/58
C!APTER ? )
).% RESULT
The !mages 0rocessed are classified into different classes of gestures and then further the
total perimeter of each gesture is saved as database in their respective classes. !n
precedingstep Query !mage is requested from the user and its features are matched
against the image features which are saved in the database.
4ollowing are the results obtained from this work.
The following images shows the successful recognition of query image6
45
8/19/2019 Application specific Gesture Recognition
50/58
The following shows the unsuccessful cases of #ecognition6
Table 9./6 #esults obtained
GESTURE
VALUE
INPUT
IMAGE
No. of
SUCCESSFUL
CASES
RATE of
RECOGNITION
/ 2@ 2@/@@
2 2@ 2@/@@
5 2@ 2@ /@@
9 2@ /= )@
; 2@ /5 =@
4rom the above table it is observed that the accuracy of the system is )*+, i.e. there are
// cases where there is a mismatch between the image input and the class to which it is
recognized.
40
8/19/2019 Application specific Gesture Recognition
51/58
).& CONCLUSION
4rom the above discussions we can conclude that, the technique applied for recognition
of the gestures works fine for chosen application, without using any artificial algorithms
for its recognition.
4or 2@ images of each gesture, the /st "esture value had /@@+ successful recognition
cases' the 2nd and 5rd gesture values also had /@@+ recognition cases' the 9 th gesture value
however had 9 mismatches with the 5 rd gesture value therefore recognized with )@+ of
accuracy' similarly the ;th gesture value has > mismatches with the 9th gesture value with
an accuracy of =;+.
4or the increase in the efficiency of the recognition algorithm used above the orientation
of the gesture images should be similar to each other, with a minute difference' another
important point that should be taken care of is that the algorithm is application specific
i.e. the threshold value for classifying the gesture needs to be revised for different
applications.
2:
8/19/2019 Application specific Gesture Recognition
52/58
BIBLIOGRAP!7
1D chlomer,R T., 0oppinga, ., Henze, 9.
J5K Eebel, ., Beil, F. 8Noellner, M. (2@@), Multi:touch gestural interaction in &5d
using hidden markov models, in PD#T @)6 0roceedings of the 2@@) 7M
symposium on Dirtual reality software and technology, 7M,
8/19/2019 Application specific Gesture Recognition
53/58
J>K Mikola1czyk, B., chmid, 7. 8Nisserman, . (2@@9, Human detection based on a
probabilistic assembly of robust part detectors, in P7omputer Dision : $77D
2@@9, Dol. 5@2/, pringer erlin I Heidelberg, pp. =*S)2.
J)K -eibe, ., eemann, $. 8chiele, . (2@@;, 0edestrian detection in crowded
scenes, in P!nternational 7onference on 7omputer Dision and 0attern
#ecognition, Dol. /, pp. )>)S));.
J*K oulay, . (2@@>, Human posture recognition for behaviour understanding, 0h%
thesis, Cniversite de
8/19/2019 Application specific Gesture Recognition
54/58
J/>K , P$valuation of a threshold:based tri:
a&ial accelerometer fall detection algorithm, "ait 8 0osture 2=(2, /*9S/**.
J/*K !sard, M. 8 lake, . (/**=, 7ontour tracking by stochastic propagation of
conditional density, in P$uropean 7onference on 7omputer Dision, Dol. /@=9 of
-ecture .
2,
8/19/2019 Application specific Gesture Recognition
55/58
J2;K Thrun, . 8 0ratt, -. (/**), -earning to -earn, Bluwer cademic 0ublishers,
K #en, G. 8 Nhang, 4. (2@@*, Hand gesture recognition based on meb:
svm, in P$mbedded oftware and ystems, econd !nternational 7onference on,
Dol. @, !$$$ 7omputer ociety, -os lamitos, 7, C, pp. 599S59*.
J2)K 7heng, G. (/**;, PMean shift, mode seeking, and clustering, !$$$ Trans. 0attern
nalysis and Machine !ntelligence />(), >*@S>**.
J2*K -ee, H.:B. 8 Bim, F. H. (/***, Pn hmm:based threshold model approach for
gesture recognition, !$$$ Trans. 0attern nalysis and Machine !ntelligence
2/(/@, *=/S*>5.
J5@K -u, E.:-. 8 -ittle, F. F. (2@@=a, imultaneous tracking and action recognition
using the pca:hog descriptor, in P7omputer and #obot Dision, 2@@=. The 5rd
7anadian 7onference on, Quebec, 7anada, pp. =S/5.
J5/K Gamato, F., 3hya, F. 8 !shii, B. (/**2, #ecognizing human action in time:
sequential images using hidden markov model, in P!nternational 7onference on
7omputer Dision and 0attern #ecognition, pp. 5>*S5);.
J52K 0inhanez, 7. 8obick, . (/**>, Human action detection using pnf propagation
of temporal constraints, in P!nternational 7onference on 7omputer Dision and
0attern #ecognition, pp. )*)S*@9.
J55K oulay, . (2@@>, Human posture recognition for behaviour understanding, 0h%
thesis, Cniversite de
8/19/2019 Application specific Gesture Recognition
56/58
J59K retzner, -., -aptev, !. 8-indeberg, T. (2@@2, Hand gesture recognition using
multi:scale colour features, hierarchical models and particle filtering, in
Putomatic 4ace and "esture #ecognition, 2@@2. 0roceedings. 4ifth !$$$
!nternational 7onference on, pp. 9@;S9/@.
J5;K irdal, . 8Hassanpour, #. (2@@), #egion based hand gesture recognition, in
P/=:th !nternational 7onference in 7entral $urope on 7omputer "raphics,
Disualization and 7omputer Dision, pp. / S >.
J5=K -eitner, 4. 87inquin, 0. (/**5, 4rom splines and snakes to snake splines, in
Pelected 0apers from the Eorkshop on "eometric #easoning for 0erception and
ction, pringer:Derlag, -ondon, CB, pp. 2=9S2)/.
J5>K Fu, . ?., lack, M. F., Minneman, . 8 Bimber, %. (/**>, nalysis of gesture
and action in technical talks for video inde&ing, Technical report, merican
ssociation for rtificial !ntelligence. ! Technical #eport :*>:@5.
J5)K Moeslund, T. ., Hilton, . and Bruger, D. urvey of dvances in Dision:ased
Human Motion 7apture and nalysis. 7omputer Dision and !mage
Cnderstanding, /@9(2 (2@@=, *@ S /2=.
J5*K huyan, M. B., *:=*).
J9/K #egina -ionnie, !vanna B. Timotius, !wanetyawan, 2@//. n nalysis of $dge
%etection as a $&tractor in Hand "esture #ecognition ystem based on
8/19/2019 Application specific Gesture Recognition
57/58
J92K , /@O. 7omputer Dision. Cpper addle
#iver, S /;), 2/;S2/=, 2**S5@@.
J9>K %ubrovin, ..' .T. 4omenko, .0. , /**/.
J9)K Haralick, #. B. ONero:crossings of second directional derivative operatorO. 0!$
0roc. 3n #obot Dision, /*)2.
J9*K 7anny, F. 4. O variational approach to edge detectionO. ubmitted to !
7onference,Eashington, %. 7., eptember, /*)5.
J;@K Marr, %., Hildreth, $. OTheory of edge detectionO. 0roc. #. oc. -ond. , 2@>,/)>:2/>, /*)@. J/9K #osenfeld, ., Thurston, M. O$dge and curve detection for
visual scene analysisO. !$$$ Trans. 7omput., 7:2@, ;=2:;=*, /*)/.
23
8/19/2019 Application specific Gesture Recognition
58/58
J;/K -. %avis, O survey of edge detection techniquesO, 7omputer "raphics and
!mage 0rocessing, vol 9, no. 5, pp 29):2=@, /*>;.