[IEEE 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010) -...

A PSO-SVM Lips Recognition Method Based on Active Basis Model Chih-Yu Hsu*、 Chih-Hung Yang**、 Yung-Chih Chen*、 Min-chian Tsai*

*: Department of Information and Communication Engineering, Chaoyang University of Technology

**: Department of Radiation Ocology, China Medical University Hospital Corresponding Author: Chih-Hung Yang ([email protected])

Abstract

The paper proposed a novel method for lip recognition based on Active Basis Model (ABM). There are four stages in a flowchart of this novel method. At the first stage the deformable templates of lip images is obtained. The lip images of deformable templates are obvious open or closed. The second stage is to obtain the deformed template of each testing images. The third stage, the difference between the deformable template and deformed template is calculated and used as a feature vector. Finally, the support vector machine (SVM) is use to classify the feature vector. SVM is a supervised learning to be a classifier for lip recognition. It is necessary to set G and C parameters, which are the two factors affecting quality of the model, when establishing SVM method. Hence, in terms of parameters selections, the researcher adopted PSO (Particle Swarm Optimization, PSO) algorithm in this research to set the best parameters combination and then incorporated into the SVM to obtain the classified results. In this paper, the novel method which utilizes PSO algorithm to select the parameters G and C automatically is called PSO-SVM method. There are 1000 face images in BioID face database used for the experiment. The experimental results show that PSO-SVM method can be a more accurate model to recognize the lip images. Key Words: active basis model, support vector machine, particle swarm optimization, lip images. 1. Introduction

In recent years, the detection of the lips and tracking is a popular of research issues. However, the detection of lip and tracking is not enough. Valid recognition of lip can significantly improve automatic speech recognition system and facial recognition system performance.

He Jun and Zhang Hua [1] in 2007, using the HSV color space to reduce the impact of light on skin color, and use the color difference blocks between lips and skin for identified their lips. Then use the R / G filter to achieve the positioning lips. In the same year Sohail and Prabir Bhattachary [2] using Viola and Jones [3] proposed rectangle features for obtaining contours of the lips and performed from this isolated mouth region using the level set method

of image segmentation. Yong-Hui Huang et al. [4] in 2008 used the least square fitting with two stage ellipses to obtain contour of lips in the images.

It can help human know the conversation content by observing the speaker's motion of lips. Therefore, lip recognition can be used in automatic speech recognition system. It is important to increase the accuracy of lip recognition. [5] Use the identification of the speak lips, re-use text-to-speech (TTS) system corresponding to the sound issue. Liou et al [6] use the Point Distribution Model (PDM) as the lip contour and lips contour close to the color model, and tracking of lips contour in the image, then to Time-Delay Neural Networks to identify lips. The lips which are opening or closing can also be applied to facial expression recognition. Using the lips to the facial expression recognition, we must first detection the state of the lips. The easiest way is classified opening or closing of the lips to identify whether the laugh. From these several research confirm that, to judge the lips of opening or closing is very important. Previous research [7] compared two methods of machine learning: adaboot model and active basis model for image representation. Which refers to active basis model combined with a small number of Gabor wavelet elements to construct the more representative model than the adaboost classifier. Therefore, we use active basis model for object detection, and get characteristic vectors. The characteristic vector can classify the lips of opening or closing by SVM. The trained SVM model can quickly make future predictions. It is necessary to set G and C parameters, which are the two factors affecting quality of the model, when establishing SVM method. In other words, the optimal parameters G and C are important for the accuracy and efficiency classification of lip images. However, the optimal parameters G and C are hard to guess. To solve this problem, we use PSO method to find optimal parameters G and C for SVM automatically. Using the PSO method, we can automatically obtain the optimal parameters G and C of SVM and do not need to guess the optimal parameters every time. We rename this model that combines SVM with PSO method to be the PSO-SVM method.

2. Methods

Deformable template for object detection and recognition is a very important element. YN Wu [9]

2010 Fourth International Conference on Genetic and Evolutionary Computing

978-0-7695-4281-2/10 $26.00 © 2010 IEEE

DOI 10.1109/ICGEC.2010.188

743

proposed an active basis model, a shared sketch algorithm, and a computational representation. sum-max maps for representing, learning, and recognizing deformable templates. In our proposed method involves four stages. The first and second stages are use YN Wu [8] proposed deformable template. At the first stage the deformable templates of lip images is obtained. The lip images of deformable templates are obvious opening or closing. The second stage is obtained the deformed template of each testing images. The third stage, the difference between the deformable template and deformed template is calculated and used as a feature vector. Finally, use the support vector machine (SVM) to classify the feature vector. SVM is a supervised Learning. The difficult in our study is that shape in mouth picture there is no significant difference. Therefore, in this study, we base on the feature of open mouth and close mouth rather than shape of object. Open mouth and close mouth as the biggest difference is that normally opens mouth we can see the teeth, and we can see a line when lips is tightly close. Therefore Active Basis template to do the classification of the two features.

2.1 Templates

In this section we used YN Wu [8] proposed methods to construct deformable templates and deformed templates. First stage is obtained the deformable templates. We used a few image of lips to construct the deformable templates which are opening and closing. Those lips of opening and closing are obviously. Figure1 shows a real example of the Active Basis Model. The first block is the generative template learned form training images, which is formed by Gabor wavelet elements. Each one of these elements is represented symbolically by a bar with the same length. This template is deformable on an active basis because the Gabor wavelet elements are allowed to locally and slightly change their locations and orientations to code each training image. The second stage is obtained the deformed template of each testing images. The deformed template is corresponding to that training and testing image. The deformed template consists of the Gabor wavelet elements which are actually perturbed their locations and orientations compared to the generative template.

Fig. 1. Active basis formed by 50 basis elements. First plot is deformable template. For each of the

other 7 pairs, the left plot is the observed image, and the right plot displays the 50 elements in the first plot

to fit the corresponding observed image.

2.2 Feature vectors After complete deformable templates and

deformed templates. We compute the log-likelihood ratio (deformed templates p versus deformable templates q ) such as formula (1). In the formula,

,m ic and ,m ir are corresponding to different

( ), , ,x y s α respectively deformable templates and deformed templates of testing images. Each deformed template of testing images gave a score with deformable template of opening and closing. Therefore, each image will be two vectors. The two vectors in the two-dimensional space can be a point representative the pictures. In this article, we use support vector machine for classification of these points. Log-likelihood function is shown as following equation (1):

( )( ) ( )

,

1 ,

( )log log

ni m im m

im m i

p cp I B

q I q r=

=∑ (1)

2.3 Support vector machine

Support vector machine (SVM) is a classification algorithm. That is a new machine learning method according to Vapnik [9] statistical learning theory. Simply, SVM is to find a hyperplane, so that two different sets separately. Neural network for classification has a good performance. However, SVM classification is generally considered better, than the neural network classification results [10]. SVM in solving the small sample, nonlinear and high dimensional pattern recognition problems can be minor errors. Therefore, we use SVM classification vector obtained by the active basis model, and training classifier. The well-trained classifier can classify an input pattern, and then it can predict an answer, the output based on the basified groups.

Suppose the training data is a simple linearly separable. SVM is mainly to find the best plane or hyperplane of the data on both sides, and hope that this line from the two sets of margin the bigger the better (Figure 2). So that we can very clearly distinguish this set of points which are otherwise computationally difficult problem because of the precision errors.

Suppose there are a bunch of point set { }, , 1,...,i ix y i n= and { }, 1, 1n

i ix R y∈ ∈ + − . We hope to find a straight line ( )f x wx b= + so that

1iy = − all the points fall in ( ) 0f x < , and 1iy = +

all the points fall in ( ) 0f x > . Where nw R∈ and b R∈ represent the weight vector and offset. We can classify of positive and negative symbols according to ( ).f x This hyperplane called separating hyperplane. The maximum margin hyperplane is called the optimal separating hyperplane. In fig.2, b w is the distance of the

744

separating hyperplane to the origin. We define the distance between the support hyperplane and the separating hyperplane called.

( )( )

1 1 ( 1,0)

1 1 ( 1,0)

d b b w w if b

d b b w w if b

= + − = ∉ −

= + + = ∈ − (2)

Therefore, we have the margin greater when w smaller.

In fact, training data is more complex than assumed. Training data itself, there will be errors. It is difficult to find the optimal separating hyperplane. Figure 2, representing the classification error of two points, because they are the optimal separating hyperplane cannot be classified. To deal with this situation, we must import an error term ξ . Of course, we hope that the error is as small as possible. We give the errors to add the tradeoff parameter C . We can slightly modify the optimization problem to add a penalty called the slack variable iξ for violating the classification constraints. We can get the objective function (3) and the constraints (4) and (5).

2

Min 2 i

i

wC ξ+ ∑ (3)

subject to 1 , for 1i i ix w b yξ⋅ + ≥ + − = + (4)

1 , for 1 i i ix w b yξ⋅ − ≤ + + = − (5) Equations (4) and (5) can be formulated into one set of inequalities.

( ) 1 0 , 0i i i iy x w b and iξ ξ⋅ − − + ≥ ∀ > (6)

Fig. 2. Support Vectors and optimal

classification hyperplane for the case of two classes.

The non-separable can be extended to non-linear classification problems. We can map the training data by the non-linear to the three-dimensional space that

( )i ix x→ Φ . Non-linear input space mapped to the linear feature space, so that data can get the optimal separating hyperplane. Mapping function called the kernel function, ( ), ( ) ( )i iK x x x x= Φ ⋅ Φ . The kernel function of several types of SVM, such as; linear, polynomial, radial basis function, and sigmoid. This article uses radial basis function (RBF) as a kernel function, because of that has better detection rate.

2.4 Particle Swarm Optimization

In this section, we describe the Particle Swarm Optimization (PSO) [11] method for function optimization. PSO is one of the optimization techniques and a kind of evolutionary computation technique. This method optimizes a problem by maintaining a population of candidate solutions called particles and moving these particles around in the search-space according to simple formulae. The movements of the particles are guided by the best found positions in the search-space, which are continually updated as better positions are found by the particles. Let : nf R R→ be the fitness function which must be minimized. Let S be the number of particles in the swarm, each having the positions

n

ix R∈ in the search space and the velocities

.n

iv R∈ Let ip be the best known position of particle i and let G be the best known position of the entire swarm. A basic PSO algorithm is listed as following: step1: for each particle 1 ~i S= , do (1) initialize the particle's position with a uniformly distributed random vector: ~ ( , ),i low uppx U b b where

lowb and uppb are the lower and upper boundaries of the search space. (2) initialize the particle's best known position to its initial position: i ip x←

(3) if ( ) ( ),if p f G< update the swarm's best

known position: iG p← (4) initialize the particle's velocity:

~ ( ( ), ( ))i upp low upp lowv U b b b b− − − Step2: until the stop criteria are arrived (ex: maximum iteration number or adequate fitness reached), repeat: (1) create random vectors

pr and Gr , where

, ~ (0,1)p G

r r U (2) update the particle's velocity:

( ) ( ),i i p p i i G G iv v r p x r G xω φ φ← + ⋅ − + ⋅ − where the ⋅ operator notes element-by-element multiplication and the parameters ω , pφ and Gφ are selected by the practitioner. (3) update the particle's position by adding the velocity: ,i i ix x v← + note that this is done regardless of improvement to the fitness. (4) if ( ) ( ),i if x f p< do (i) update the particle's best known position:

i ip x←

(ii) if ( ) ( ),if p f G< update the swarm's best

known position: iG p← (5) Now G is the best found solution.

745

3. PSO-SVM Method In this paper, we use the accuracy rate to be the fitness function of PSO method. And we use PSO method to find optimal parameters G and C for

SVM automatically. We rename this model that combines SVM with PSO method to be the PSO-SVM method. The flowchart of PSO-SVM is shown as the following Fig.3.

Fig. 3. First row: The flowchart of PSO-SVM.

4. Experimental results Our proposed method is mainly used in BioID

face database. The cutting images of lips into five groups were 50, 200, 500, and 1000 from BioID face database. Initial image size is 100 75× . The quality of classification was decided in the construction of the template. Errors caused by the template for many reasons. Resolution, shoot angle, and exposure can cause that build error model. The Fig.4 shows some lips images that that photographed not very well.

These images of lips are difficult to detect opening or closing by human vision even that computer vision more difficult.

Fig. 4. First row: The shooting angle from the top-down. Second row: The exposure is too high or

underexposed.

746

Using active basis model to construct templates of images need to attention basis element of number and scale size. The different basis element of number and scale size will affect the classification results. The basis elements of numbers n be 10, 20, 30, 40, 50 and the scale size s be 0.5, 0.7, 1.0, 1.3 to cross-validation for all samples respectively. First, we use different basis elements corresponding to the fixed scaling coefficient as { }10,20,30,40,50 1.0n s= = . Shown in Fig.5(a), in our problem, basis elements when the number n of 40 to have the best results for each sample. We now apply different scaling coefficients corresponding to the number of basis elements n be 40 as { }0.5,0.7,1.0,1.3 40s n= = . Show in Fig.5(b), when the scaling coefficient s of 1.0 and 0.7, respectively, the average accuracy rate of 95.342% and 95.918%. From the experimental results, we can see that the number of base elements 40, scaling coefficient of 0.7 to build the best template to improve the classification accuracy. The best template condition is given above. In this condition, we utilize SVM to classify the 1000 samples without using PSO method. The best accuracy rate is 96.4%. Next, we use the PSO-SVM to repeat the same experiment. All the accuracy rate are more than 99%. From the experimental results, we can know that the PSO-SVM method has higher accuracy.

Fig. 5. (a) When scale coefficient is 1.0, the detection rate. (b) When the number of basis element is 40, the

detection rate. 5. Conclusion

In this article, the use of support vector machine to the testing image by the trained active basis models for classification. The lips of opening or closing features in different given different scores, and use the support vector machine to classification. To find the optimal parameters G and C of SVM, we use PSO method to obtain the optimal parameters automatically. Experimental results show that the proposed method has a very accurate result of classification. However, shoot angle and exposure needs to improve.

Therefore, how to improve the shoot angle and the exposure level will be the main task in the future. Effectively improve the classification results help to speech recognition on high noise. When coupled with the eyes and eyebrows recognition, we can effectively use in facial recognition.

6. References [1] He Jun and Zhang Hua, (2007), A Real Time Lip

Detection Method in Lipreading. Control Conference, 2007. CCC 2007. Chinese. pp. 516-520.

[2] A. Sayeed Md. Sohail, P. Bhattacharya,(2007), Automated lip contour detection using the level set segmentation method. Int. Conf. on Image Anal. And Proc. (ICIAP 2007), pp. 425-430.

[3] P. Viola and M. Jones,(2002), Robust real-time object detection. International Journal of Computer Vision.

[4] Y. H. Huang, B. C. Pan, S. L. Zheng, J. Pan, and Y. Tang,(2008), Lip-reading detection and localization based on two stage ellipse fitting. International Conference on Wavelet Analysis and Pattern Recognition, 2008, pp. 168-171.

[5] Hao, T., F. Yun, J. Tu, S. Thomas .S. Huang, and Mark Hasegawa-Johnson,(2008), EAVA: A 3D Emotive Audio-Visual Avatar. 2008 IEEE Workshop on Applications of Computer Vision (WACV'08), 1-6.

[6] Liou, C.Y . Active mesh for lip-reading, http://ntur.lib.ntu.edu.tw/bitstream/246246/8434/1/882213E002036.pdf

[7] Hui Tang,(2009), Comparison of an active basis model and an adaboost model in image representation. http://theses.stat.ucla.edu/105/Hui%20thesis.pdf

[8] Wu, Y. N., Shi, Z., Fleming, C., & Zhu, S. C.,(2007), Deformable template as active basis. Proceedings of international conference on computer vision.

[9] Vapnik, V.N. (1995). The Nature of Statistical Learning Theory, Springer-Verlag, New York.

[10] F. Qi, C. Bao, and Y. Liu,(2004), A novel two-step SVM classifier for voiced/ unvoiced/ silence classification of speech. Proc. Int. Symp. Chinese Spoken Language, pp. 77-80.

[11] Kennedy, J., Eberhart, R. C. (1995)“Particle Swarm Optimization,” Proc. of IEEE International Conference on Neural Networks, 4, 1942-1948.

747

Date post:	04-Jan-2017
Category:	Documents
Upload:	nguyenkiet
View:	213 times
Download:	0 times

[IEEE 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010) -...

Documents