Date post: | 25-Jan-2017 |
Category: |
Documents |
Upload: | alexander-clark |
View: | 178 times |
Download: | 0 times |
Calculating the Weight of a Pig
through Facial Geometry using
2-Dimensional Image Processing
by
Alexander W. Clark, B.S.
A Thesis
In
Electrical Engineering
Submitted to the Graduate Faculty
of Texas Tech University in
Partial Fulfillment of
the Requirements for
the Degree of
MASTER OF SCIENCES
IN
ELECTRICAL ENGINEERING
Approved
Dr. Brian Nutter
Chair of Committee
Dr. Sunanda Mitra
Committee Member
Mark Sheridan
Dean of the Graduate School
August, 2015
Copyright 2015, Alexander W. Clark
Texas Tech University, Alexander W. Clark, August 2015
ii
ACKNOWLEDGMENTS
I want to first thank Dr. Nutter, not only setting me up with this project, but for
looking out for me throughout my entire education at Texas Tech. When I was taking
the challenging Electronics II with him, my parents had to remind me that Dr. Nutter
“was my friend” by pushing his students so hard. Now that I am graduating though, I
truly can say that he is my friend. Dr. Nutter has helped countless times and directly
enabled me to achieve my dreams. I hope he knows that the little favors and many
hours he pours into his students are remembered and appreciated forever.
I would like to acknowledge the folks at Animal Biotech, including Garrett
Thompson and Dr. John McGlone, for dreaming up this crazy idea and giving me the
thousands of pig pictures I needed to make it happen.
I also graciously thank Dr. Mitra for being on my committee. Your willingness
to help a student you don’t know well is as impressive as it is appreciated.
I especially need to thank my family as well. My parents have always
encouraged me to aspire for impossible dreams and are the first to believe I can
accomplish anything. Thank you for helping me become an accomplished engineer by
first teaching me to be a man of character. Your love has always been known and felt,
even out here in the plains of West Texas. Don’t worry – I’m coming home.
And Rachel, thanks for putting up with the many hours of “piggie piggie crop
crop” and supporting the crazy timeline I was shooting for. It helps that you always
knew I would figure this project out – even when I didn’t think I would!
Lastly, I cannot conclude these acknowledgements without expressing my
firm belief that nothing I achieve could be possible or would have any value aside
from my faith in Christ. The education I have received and the many hours of work I
have put into this degree cannot take away from the true victor that has led me here.
“For the horse is made ready for the day of battle,
but the victory rests on the Lord.”
Proverbs 21:31
Texas Tech University, Alexander W. Clark, August 2015
iii
TABLE OF CONTENTS
ACKNOWLEDGMENTS .................................................................................... ii
ABSTRACT .......................................................................................................... vi
LIST OF TABLES .............................................................................................. vii
LIST OF FIGURES ............................................................................................. ix
I. INTRODUCTION ............................................................................................ 1
II. FEATURE DETECTION USING THE VIOLA-JONES
FRAMEWORK .......................................................................................... 4
Grayscale Conversion and Coordinate Plane .................................................... 4
Integral Images .................................................................................................. 5
AdaBoost Technique ......................................................................................... 8
Introduction to Algorithm ........................................................................... 8
AdaBoost Algorithm Description ............................................................... 8
Cascade Classifiers............................................................................................ 9
Summary of Training Parameters on Classifiers Created ............................... 11
Feature Detection Conclusion ......................................................................... 13
III. FALSE POSITIVE REDUCTION AND VALID FEATURE
SELECTION ............................................................................................. 15
Valid Face Selection ....................................................................................... 15
Valid Nose Selection ....................................................................................... 16
Valid Eye Selection ......................................................................................... 16
Properly Cropping the Eye Photo ............................................................. 16
Limiting Eye Classification to a Dynamic Region of Interest .................. 17
Selecting Valid Set of Eye Based on Probability ...................................... 19
Classification Testing Results ......................................................................... 21
IV. FACIAL RECOGNITION ........................................................................... 23
Uniform Transformation of Pig Facial Features ............................................. 23
The Need for Common Feature Positions in Facial Recognition ............. 23
Perspective Quadrilateral Mapping ........................................................... 24
Perspective Transformation of Pixel Position ........................................... 26
Bicubic Pixel Interpolation ....................................................................... 27
Local Binary Patterns ...................................................................................... 31
Texas Tech University, Alexander W. Clark, August 2015
iv
Features of Local Binary Patterns ............................................................. 31
Local Binary Images ................................................................................. 32
Histogram Comparison ............................................................................. 33
Unsupervised Data Clustering......................................................................... 34
Facial Recognition Conclusion ....................................................................... 36
V. REGRESSION ................................................................................................ 37
Examination of Features ................................................................................. 37
The Feature Vector Sets ............................................................................ 37
Methods Attempted ................................................................................... 38
Undesirable Results ................................................................................... 39
Least Squares Method with Interdependent Predictors ................................... 41
Desirable Results ....................................................................................... 41
Averaging Pig Clusters ............................................................................. 43
Predictor Creation ..................................................................................... 45
Primary Features Used .............................................................................. 46
Least Squares Methodology ...................................................................... 46
Regression Conclusion .................................................................................... 48
VI. CLUSTER ADJUSTMENTS ....................................................................... 49
Outlier Detection ............................................................................................. 49
Cluster Minimization ................................................................................ 49
Nose Angle Limitation .............................................................................. 51
Grubb’s T-test ........................................................................................... 53
Cluster Regrouping ......................................................................................... 56
Algorithm for Regrouping Fractured Clusters .......................................... 56
Final Cluster Regrouping Results ............................................................. 60
Cluster Adjustments Conclusion ..................................................................... 61
VII. CONCLUSION ............................................................................................ 62
Accomplishments ............................................................................................ 62
Future Work .................................................................................................... 62
Closing Remarks ............................................................................................. 65
BIBLIOGRAPHY ............................................................................................... 66
APPENDICES
A. CONSOLE LOG DURING FEATURE DETECTION .............................. 67
Texas Tech University, Alexander W. Clark, August 2015
v
B. EXAMPLES OF PIGS CLASSIFIED USING PROGRAM ...................... 68
C. CONSOLE LOG DURING TRAINING MODE ........................................ 72
D. CONSOLE LOG AT END OF TRAINING MODE ................................... 73
E. CONSOLE LOG DURING FACE RECOGNIZER MODE ...................... 74
F. CONSOLE LOG AFTER FACE RECOGNIZER MODE ......................... 75
G. HOW-TO GUIDE ON RUNNING PIG ESTIMATION PROGRAMS .... 77
H. MATLAB CODE FOR LEAST SQUARES REGRESSION ..................... 86
I. OPENCV C++ SOURCE CODE .................................................................... 91
Texas Tech University, Alexander W. Clark, August 2015
vi
ABSTRACT
This thesis will outline the groundwork of facial detection and recognition
software to be used with pigs in order to estimate their weight from a digital image.
The facial detection of the pig is achieved through identification of the features using
the Viola-Jones method for cascade classifiers and basic likelihood functions. The
document will cover both the general theory behind these concepts and the actual
implementation as used in the software. Next, the need of transforming the newly
detected pig face to be used for facial recognition is covered through perspective
transformation and bicubic pixel interpolation of the facial geometries. After this, the
thesis will discuss the use of local binary patterns to sort the photos of the pigs with an
unsupervised clustering technique. Next, the implementation of least squares
regression is covered to predict the weight of a pig from the facial features. Finally,
the thesis will conclude with a discussion on the multiple error-checking and outlier
correction techniques used to make the software more robust.
Texas Tech University, Alexander W. Clark, August 2015
vii
LIST OF TABLES
2.1 Summary of Training Parameters for Final Cascade Classifiers .............. 11
6.1 Tcrit Values for Grubbs’ T-test for Outliers [“Outlier”] ........................... 55
Texas Tech University, Alexander W. Clark, August 2015
viii
LIST OF FIGURES
1.1 Program Flow Chart .................................................................................... 3
2.1 Transformation of Source into Equalized Grayscale Image ....................... 5
2.2 Project Coordinate System .......................................................................... 5
2.3 Integral Image Example .............................................................................. 6
2.4 Example Rectangular Features Enclosed in Detection Window................. 6
2.5 Examples of Haar Features in a Pig Photo ................................................. 7
2.6 Cascade Classifier Flow Chart [Viola] ....................................................... 9
2.7 Cascade Classifier Optimization Algorithm [Viola] ................................. 10
2.8 Transformations Performed on Positive Image Set for
Robustness .................................................................................... 12
2.9 Examples of Positive Images Used for Pig Features. ............................... 13
3.1 Selection of Most Valid Face .................................................................... 15
3.2 Selection of Most Valid Nose ................................................................... 16
3.3 Early Eye Classifiers’ Prolific False Positives .......................................... 17
3.4 Cropped Images for Eye Classifiers .......................................................... 17
3.5 Classifying Entire Image for Eyes ............................................................ 18
3.6 Classifying Only the Face Region Interest for Eyes ................................. 18
3.7 Eye False Positives Correctly Rejected..................................................... 21
3.8 Examples of Rejected Images Due to Lack of Adequate
Features ......................................................................................... 22
4.1 Four Different Pictures of the Same Pig ................................................... 23
4.2 Perspective Transformation of the Pig Face ............................................. 24
4.3 Perspective Transformation of a Warped Quadrilateral............................ 26
4.4 The Basic Pixel Interpolation Model ........................................................ 28
4.5 Sixteen Neighbors Used for Bicubic Interpolation ................................... 29
4.6 A 3x3 Pixel LBP Example ........................................................................ 32
4.7 LBP Feature Examples [Wagner] ............................................................. 32
4.8 Local Binary Pattern Image and Histogram Concatenation ...................... 33
4.9 Supervised LBP Facial Recognition Flowchart ........................................ 34
4.10 Unsupervised LBP Facial Recognition Flowchart .................................... 35
5.1 Pig Face Feature Vectors ......................................................................... 37
Texas Tech University, Alexander W. Clark, August 2015
ix
5.2 Example of Unsuccessful Regression with High Bias and Low
Variance ....................................................................................... 39
5.3 Example of Overfitting the Training Set .................................................. 40
5.4 High Variance of a Testing Test after Overfitting the Training
Set ................................................................................................. 41
5.5 Regression Results of Features for Training Set ...................................... 42
5.6 Regression Results of Features for Testing Set ........................................ 43
5.7 Averaging Results of Training Set ........................................................... 44
5.8 Averaging Results of Testing Set ............................................................. 44
6.1 Example of Misclassification Error in the Pig Photo ............................... 50
6.2 Example of Cluster 6 Being Discarded Due to Insufficient Size ............. 51
6.3 Picture of Pig Face Exhibiting Desirable Eye and Nose Angles .............. 52
6.4 Picture of Pig Exhibiting Undesirable Eye and Nose Angles .................. 53
6.5 Example of Pig Misidentification ............................................................ 54
6.6 Example of Cluster Fracturing With Unintentionally Split
Clusters Highlighted .................................................................... 56
6.7 Cluster Regrouping Flowchart ................................................................. 57
6.8 Console Log of Clusters Being Combined Using the
Regrouping Method of Supervised Facial Recognition ............... 58
6.9 Console Log of Cluster Being Rejected for Combination Using
the Regrouping Method of Supervised Facial
Recognition .................................................................................. 59
6.10 Cluster Regrouping Final Results ............................................................ 60
7.1 Necessary Standard Metrics for Camera Set-Up ..................................... 63
7.2 Devices Used in Setting Up the Standard Metrics of the Camera
System ........................................................................................... 64
Texas Tech University, Alexander W. Clark, August 2015
1
CHAPTER I
INTRODUCTION
Knowing the weight of a pig is vital in the agricultural market of meat
processing. There is a target weight that an ideal pig should have when the farmer sells
it to make maximum profit. If a pig is below this target weight, then profits are lost
because the pig was not the standard required size. If the pig is above the target
weight, then that is excess, unused weight. This extra weight is ultimately a waste of
resources used to grow the pig to a mass larger than needed for the standardized size.
Weighing pigs is not a trivial task though. Getting an individual pig to
cooperate and stand still on a scale long enough to be measured can be very difficult,
especially if using a balance scale rather than a digital one.
The pig’s weight does not necessarily have to be measured directly though.
Pigs have a unique characteristic that their eyes grow further apart linearly with their
growth in weight. If a person knew the exact distant between a pig’s eyes, then they
could easily predict its weight. Using this fact, one could presumably create software
that uses a digital image of a pig to calculate the distance between the eyes of a pig
and ultimately predict its weight.
Application in the agriculture industry would include day-by-day tracking of
pig weights in order to calculate the ideal time to sell a pig to the market. Anything
underneath an ideal weight isn’t worth the full market price. Anything above that ideal
weight is wasted resources. Farmers can increase profit by optimizing their selling
procedures to match the growth rate of the pigs in their care. Measuring the weight of
a pig automatically by an image also requires significantly less man power than
weighing each pig individually on a mechanical scale.
The software designed in the course of this project seeks to meet this need by
providing the means of determining the weight of a pig using nothing more than a
semi-low resolution camera in the pig pen.
Texas Tech University, Alexander W. Clark, August 2015
2
The goals of the project were to develop a low-cost solution to capturing
pictures of pigs and using image processing to estimate the weight of the pig. The low
cost solution, a miniature computer called the Raspberry Pi with a camera, was placed
in several pig pens with the help of Animal Biotech. Water flow from the spigot where
the pigs drink was monitored by a sensor connected to the computer. Whenever a pig
drank from the spigot, a digital image of the pig was taken and stored on the device.
The involvement of the project covered in this document does not go into great depth
on the hardware of the project. Rather, it will cover how the design goals of the
software were met and the processes that were used in the image processing.
The software being designed must be self-contained within the box that houses
the computer and camera. It would not be possible on most farms to take pictures at
the box and transmit those wirelessly. The bandwidth and connection simply do not
exist at typical farms. Instead, the picture must be processed and the data stored within
the pig pen. This requires that the software be fast so that digital images can be
efficiently processed and discarded to keep the limited memory space on the tiny
computer module free. A design goal on speed is that an image can be captured and
processed under a second.
The image processing software that manages the images, which will through
the course of the thesis simply be referred to as the program, also must run on a tiny
computer module, such as the Raspberry Pi. For this reason, the computer language
C++ was chosen in conjunction with the open source library OpenCV. With this
library and language, an executable can be compiled and placed on the device without
the installation of any other advanced programs with image processing capabilities.
As for the processing itself, the software has two main goals: predict the
weight of the pig and identify the pig that the face and weight belong to. The first step
is an object detection problem, while the second is facial recognition. This document
will fully cover how the features of the pigs are detected, how they correlate to the
estimated weight of the pig, and finally how unsupervised facial recognition
technology is used on pigs to sort the data.
Texas Tech University, Alexander W. Clark, August 2015
3
A full program flow chart is shown below in Figure 1.1. The thesis will follow
the outline and processes of this flow chart. All operations will be described in the
same sequential order as the program flow.
Figure 1.1: Program Flow Chart
Texas Tech University, Alexander W. Clark, August 2015
4
CHAPTER II
FEATURE DETECTION USING THE VIOLA-JONES
FRAMEWORK
The first step in calculating the facial geometries begins with the creation of
classifiers that can identify the three main features of a pig head: the face, the eyes,
and the nose. To service the needs of the project, the Viola-Jones Framework is
followed to create three cascade classifiers. This section presents a summary of the
Viola-Jones framework for facial detection, as presented in their paper [Viola]. It is a
robust and rapid method that utilizes three important tools: integral images for quick
feature evaluation, AdaBoost to construct classifiers, and cascade classifiers to further
reduce the operating time.
Grayscale Conversion and Coordinate Plane
Before covering the Viola-Jones Framework, it is worth mentioning that all
image processing in this project was on monochromatic images. Although many of the
images shown in this document and output of the program are represented in color, the
actual calculations were performed on grayscale images transformed by Equation 2.1.
𝑅𝐺𝐵 𝑡𝑜 𝐺𝑟𝑎𝑦: 𝑌 ← 0.299 ∙ 𝑅 + 0.587 ∙ 𝐺 + 0.114 ∙ 𝐵 (2.1)
After the grayscale conversion, we also equalize the histogram of every
normalized image used. This means we calculate the histogram 𝐻, normalize it so that
there are 256 bins (the number of possible pixel values in the grayscale image), and
then calculate the cumulative distribution function of the histogram using Equation 2.2
[“Histogram”].
𝐻𝑖′ = ∑ 𝐻(𝑗)0≤𝑗<𝑖 (2.2)
The image can then be transformed to increase the contrast and normalize the
brightness by using 𝐻𝑖 as a look-up table (Equation 2.3) [“Histogram”].
𝑑𝑠𝑡(𝑥, 𝑦) = 𝐻′(𝑠𝑟𝑐(𝑥, 𝑦)) (2.3)
The transformation of the source image is shown below in Figure 2.1.
Texas Tech University, Alexander W. Clark, August 2015
5
Figure 2.1: Transformation of Source into Equalized Grayscale Image
Finally, it is also worth noting that we use the coordinate directions depicted in
Figure 2.2 throughout the project.
Figure 2.2: Project Coordinate System
Integral Images
The first key concept in the Viola-Jones Framework is forming an integral
image. An integral image is a representation of an image that allows the sum of all the
pixels in a rectangular region of the image to be computed in a constant time,
independent of the size of the rectangle. Each element in the integral image is the
inclusive sum of all the pixels of the original image that are above and to the left of the
pixel in the original image. To demonstrate the use of the integral image, consider
Figure 2.3.
Texas Tech University, Alexander W. Clark, August 2015
6
Figure 2.3: Integral Image Example
The sum of the pixels in region D in the original image is equal to the value of
element 1 minus the values of elements 2 and 3, plus the value in element 4. This
integral image greatly simplifies the calculation of the Haar-like features that are used
for facial detection. Three types of features are used in Viola-Jones: two rectangle
features, which require six array references; three rectangle features, which require
eight; and four rectangle features, which require nine (Figure 2.4).
Figure 2.4: Example Rectangular Features Enclosed in Detection Window
The features shown in squares A and B of the figure are two-rectangle features, C is a
three-rectangle feature, and D is a four-rectangle picture.
Texas Tech University, Alexander W. Clark, August 2015
7
These features are calculated as the sum of the pixels in the white rectangles
minus the sum of the pixels in the grey rectangles. Examples of how these features
correlate to the actual images of the pigs can be seen below in Figure 2.5.
Figure 2.5: Examples of Haar Features in a Pig Picture
The two pictures on the left focus on features associated with the eyes. The
upper-left one correlates with the face as a whole and the dark horizontal area
associated with the eyes while the lower-left one looks at the light space associated
with the bridge of the nose.
The two pictures on the right focus on features associated with the sides of the
pig’s head. The upper-right picture exhibits a feature that frames the face of the pig,
while the lower-right picture shows the feature associated with the angle along the
edge of the pig head and the background.
All of the features shown in the above image could individually be considered
weak classifiers by themselves, roughly defining an aspect of the desired object.
Texas Tech University, Alexander W. Clark, August 2015
8
AdaBoost Technique
Introduction to Algorithm
For the 24 x 24 pixel windows used in the Viola-Jones paper and for the eye
and face classifiers, there are approximately 180,000 possible features. Rather than
creating a classifier using all of the features, it would be helpful to only use a select
subset of feature vectors that have the greatest effect on detecting the desired object in
the window. This is where the AdaBoost technique is useful. “Adaptive Boosting” is
used to create a strong classifier by selecting and combining several weak classifiers.
Essentially, the technique iterates and selects the classifier with the lowest
classification error and an associated weight. After this classifier is combined with the
others, the algorithm continues until the desired total number of weak classifiers is
reached. The AdaBoost algorithm as used by Viola-Jones is described in the section
below.
AdaBoost Algorithm Description
Start with example images (x1, y1),…,(xn, yn) where yi = 0 for negative
examples (lacking desired object) and yi = 1 for images with positive examples
(contains desired object).
The initial weight for each weak classifier, w1,i is determined by Equation 2.4
[Viola] below, where m is equal to the number of negative images and l is the number
of positive images.
𝑤1,𝑖 ={
1
2∙𝑚 , 𝑓𝑜𝑟 𝑦𝑖=0
1
2∙𝑙 , 𝑓𝑜𝑟 𝑦𝑖=1
(2.4)
Next, Adaboost iterates the variable t from 1,…,T. T is total possible number of
Haar features that can be found in the image. At the beginning of the iteration, the
weight of each weak classifier is normalized by the probability distribution in
Equation 2.5 [Viola].
𝑤𝑡+1,𝑖 ←𝑤𝑡,𝑖
∑ 𝑤𝑡,𝑗𝑛𝑗=1
(2.5)
Texas Tech University, Alexander W. Clark, August 2015
9
Now, for every feature, j, the algorithm trains a weak classifier hj which is
limited to just this one feature. The error of that classifier is calculated with respect to
wt (Equation 2.6) [Viola].
𝜖𝑗 = ∑ 𝑤𝑖|ℎ𝑗(𝑥𝑖) − 𝑦𝑖|𝑖 (2.6)
Next, the classifier, ht, with the smallest error ϵt is chosen and the weights are
updated (Equation 2.7).
𝑤𝑡+1,𝑖 = {𝑤𝑡,𝑖 ∙ [
𝜖𝑡
1−𝜖𝑡] , 𝑖𝑓 𝑥𝑖 𝑖𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦
𝑤𝑡,𝑖, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (2.7)
After all of the weights have been fully updated, then the final strong classifier
is expressed by Equation 2.8.
ℎ(𝑥) = {1, ∑ ℎ𝑡(x) ∙ log (
1−𝜖𝑡
𝜖𝑡) ≥
1
2∑ log (
1−𝜖𝑡
𝜖𝑡)𝑇
𝑡=1𝑇𝑡=1
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (2.8)
Cascade Classifiers
The simplest way to improve the performance of the AdaBoost classifier is to
increase the number of features used, but this directly increases the computation time
required. Hundreds of windows, or sections of the digital image at different scales and
positions, must be tested for the object as well. To improve the performance of the
detection system while keeping the computation time low, a cascade of classifiers was
used, as seen in Figure 2.6, where failing one stage immediately discards that window,
and passing the stage allows the next classifier to be applied.
Figure 2.6: Cascade Classifier Flow Chart [Viola]
Texas Tech University, Alexander W. Clark, August 2015
10
Basically, it is logically assumed that most windows will not have a face, eye,
or nose present in them, so the first classifiers in the cascade reject the windows that
obviously lack the particular object. This eliminates most of the windows in the image
using a computationally inexpensive classifier. The later stages increase in complexity
to ensure that a face, eye, or nose is truly present, but since most windows do not
reach these stages, they do not significantly affect the computation time for the image
as a whole. The algorithm used to optimize the Viola-Jones cascade classifiers is given
in Figure 2.7.
Figure 2.7: Cascade Classifier Optimization Algorithm [Viola]
Given the desired overall detection rate, false positive rate, and the number of
stages, the necessary performance of the individual stages can be found. Then, using
the AdaBoost technique, a classifier can be trained for each stage that meets those
specifications. The cascade classifier, when combined with the other techniques
mentioned, allows faces in an image to be detected accurately and efficiently.
Texas Tech University, Alexander W. Clark, August 2015
11
Summary of Training Parameters on Classifiers Created
Now that the algorithms and theory behind the classifiers has been discussed,
this section will cover a few specifics on the classifiers created for this project. In
total, three classifiers were created: a pig face, a pig eye, and a pig nose. All were
created using AdaBoost and Haar features and trained using OpenCV’s cascade
classifier training programs.
Below in Table 2.1, the parameters for each classifier are displayed.
Table 2.1: Summary of Training Parameters for Final Cascade Classifiers
Face Classifier Eye Classifier Nose Classifier
Positive Set 2906 3986 4040
Negative Set 2542 2542 2542
Dimensions (pixels) 24x24 24x24 32x16
Stages 20 32 33
Minimum Hit Rate 99.9% 99.9% 99.9%
Max False Hit Rate 50% 50% 50%
The positive set parameter designates the size of the set of images that have
been marked as positive examples of the object being trained. For instance, in the case
of the nose classifier, a set of 4040 cropped images of actual pig noses were loaded
into the training program. This set of positive images is careful to include all kinds,
shapes, and positions. In order to make sure that all pig noses can be classified, it is
important to make sure that images of all different types of noses with varying
markings are included. In order to simulate drastic changes in environmental lighting,
a good portion of the positive samples were repeated but with differing contrast levels
and exposure adjustments completed in a basic image editing program. This should
enable the classifier to work in a variety of lighting conditions. Lastly, in order to
increase the size of the positive sample set and ensure an asymmetrical classifier,
every positive image that was cropped and doctored was also duplicated one more
time but flipped horizontally. This way, every positive image in the set also has a
matching mirrored image to accompany it. This process was repeated for both the face
Texas Tech University, Alexander W. Clark, August 2015
12
and the eye classifiers too. An example of the transformations made on a positive
image of a pig eye can be seen below in Figure 2.8.
Figure 2.8: Transformations Performed on Positive Image Set for Robustness
The negative image set is a large set of any background images not containing
the desired objects. For this application, the focus was mainly set of using thousands
of stock images of dirt, rocks, mud, iron bars, wood – background typical of a pig pen.
It is also worth noting that some cropped images of pigs without the desired feature
(for example, a pig nose would not contain a pig eye) were included to ensure a more
robust classifier. Not as many negative images were needed as positive images since
the training program takes a random cropped window from the negative image and
will reuse negative images by selecting different portions of the picture.
The dimensions designate the size of the positive images trained. For both the
face and the eye classifier, 24 by 24 pixel images were used. The nose classifier used a
32 by 16 pixel image to accommodate the long horizontal nature of a pig nose.
Positive images were all cropped to include as much of the desired feature (face, eye,
and nose) as possible without including too much of the background. Examples of
actual cropped images can be seen below in Figure 2.9.
Texas Tech University, Alexander W. Clark, August 2015
13
Figure 2.9: Examples of Positive Images Used for Pig Features
A positive image of a pig face is shown on the left side of the figure, a positive image
for an eye is in the middle, and a positive image for a nose is on the right.
The stages designate the number of levels a window must go through on the
cascade classifier to be marked as valid. The face classifier uses 20 stages. The eyes
and nose classifiers are stricter with 32 and 33 stages, respectively. Pig faces as a
whole image have significantly less variation than the images of the eyes or nose,
meaning it is comparatively easier to distinguish if a window contains a pig face than
if it contains a nose or eye.
The minimum hit rate is the percentage of positive images that must be
correctly classified as a valid feature for a single stage. The larger the hit rate, the
better quality the classifier is, but the longer it takes to train the data. Here, 99.9% is
used to create high quality classifiers, but all three classifiers took well over 24 hours
to train.
Finally, the maximum false hit rate is the likelihood that negative images will
be incorrectly classified as positive features. This high number of false positives is not
an issue after all of the weak classifiers are combined.
Feature Detection Conclusion
This project begins with the detection of facial features. Without the ability to
detect the location and size of those features on an image, the estimation of the pig’s
weight would be impossible. As covered in this chapter, we used the Viola-Jones
Texas Tech University, Alexander W. Clark, August 2015
14
framework to create three cascade classifiers: pig faces, pig eyes, and pig noses. These
classifiers, composed of many weak classifiers combined together, can determine with
reasonable certainty whether or not a digital image has all three of those features. The
remaining chapters will cover what to do with the detected features in order to reach a
method for estimating the pig’s weight.
Texas Tech University, Alexander W. Clark, August 2015
15
CHAPTER III
FALSE POSITIVE REDUCTION AND VALID FEATURE
SELECTION
One of the distinct advantages of cascade classifiers is the dynamic number of
objects classified. Therefore, the classifier can determine when there are no objects
detected in a picture, avoiding the problem of attempting to assign a location and size
of a pig face that does not exist in the digital image. While the ability to not force a
classification is desired, this also introduces the difficulty of false positives. In order to
select a valid feature from the detected objects, likelihood functions much be created
to ensure that the object we select is actually one that exists. In the end, only a picture
with a valid face, nose, and set of eyes will have its weight calculated.
Valid Face Selection
The valid face is determined by the classified object with the shortest
Euclidean distance to the center of the picture, like the valid frame for the pig face
selected in the example below in Figure 3.1.
Figure 3.1: Selection of Most Valid Face
Texas Tech University, Alexander W. Clark, August 2015
16
Valid Nose Selection
The selection of the nose is similar to the selection of the face, except that for
all images taken of the pigs, the noses are located near the bottom of the picture.
Given this, the most valid nose is selected to be the one with the shortest Euclidean
distance to the bottom-center of the picture, like the nose selected in Figure 3.2 below.
Figure 3.2: Selection of Most Valid Nose
Valid Eye Selection
Correct selection of the eyes is more complicated than of the face and the nose.
Due to the widely varied difference in pig eyes through their growth cycle and the
disadvantage of a semi-low resolution camera, the eye classifier had to be created so
that it is more accepting, or generic, than the nose and face classifiers. Due to this,
there are many more false positives of eyes than any other feature.
Properly Cropping the Eye Image
Early classifiers for eyes had very poor results. Consider the output of the early
eye classifier shown in Figure 3.3; the image is riddled with false positives.
Texas Tech University, Alexander W. Clark, August 2015
17
Figure 3.3: Early Eye Classifiers' Prolific False Positives
A large part of this is due to the cropping of the source images used to train the
classifier. Initial images were cropped very closely to the eyeball of the pig,
accidentally leaving out the precious features of the eyelid and folds around the eye.
Later images were cropped like the eyes shown below in Figure 3.4, which finally
yielded at most one false positive per pig face.
Figure 3.4: Cropped Images for Eye Classifiers
Limiting Eye Classification to a Dynamic Region of Interest
The next improvement that can be done to make the eye classifier stricter is to
limit the face as the region of interest. Take, for instance, the image below in Figure
3.5, which does not have a limited region of interest.
Texas Tech University, Alexander W. Clark, August 2015
18
Figure 3.5: Classifying Entire Image for Eyes
The only way to correctly classify the eyes when searching the whole picture is to
make a more general classifier with fewer features. This increases not only the amount
of false positives, but also lengthens the processing time.
However, if we take the same image and limit it to a region of interest such as
just the face, we can require more features in the cascade classifier and reduce the
false positives classified in the picture to zero, such as shown below in Figure 3.6.
Figure 3.6: Classifying Only the Face Region of Interest for Eyes
Texas Tech University, Alexander W. Clark, August 2015
19
Selecting Valid Set of Eyes Based on Probability
Even though the number of false positives is greatly reduced, there is still a
need to select the valid set of eyes out of all of those classified, in case there are
misclassified objects.
The first step in weeding out the false positives is to ensure that there is at least
one eye on both the left and right side of the face. The pig face is divided into two
regions: the left and the right half. Then, the positions of all of the classified eyes are
checked against these regions to ensure that at least one eye falls into each region. If
this check fails, then it is considered that the current face has an invalid set of eyes.
After that first check, the valid set is selected by comparing all pairs of
combinations of left and right eyes for which two are most likely to be the valid set.
The most valid set of eyes is one that:
1. Has smallest least square relative error between 50x50 pixel scaled images
of the eyes (with right side flipped).
2. Has greatest similarity in size.
3. Is closest to the horizon line of the face.
4. Is farthest apart.
5. Has the smallest angle between the two.
We then mathematically determine the likelihood per pair of eyes so that the
pair with the highest likelihood is the valid pair of eyes.
The first criterion is measured using Equation 3.1, where 𝑖 represents an eye on
the left side of the face and 𝑗 represents an eye on the right.
𝑃1(𝑖, 𝑗) = 𝐿2
𝑖𝑗−𝐿2𝑚𝑖𝑛
𝐿2𝑚𝑎𝑥−𝐿2
𝑚𝑖𝑛 , (3.1)
where 𝐿2𝑖𝑗 represents the least squares relative error between two pairs of eyes. To
calculate the error, the image of the right eye is compared to the flipped image on the
left after both have been resized to the same dimensions. Then, pixel by pixel, the two
Texas Tech University, Alexander W. Clark, August 2015
20
images are compared in value. The smaller the error value, the more similar the
images are.
Next, the sizes of the two eyes are compared using Equation 3.2, where 𝑤𝑖 and
𝑤𝑗 are the widths of the left and right eye being compared.
𝑃2(𝑖, 𝑗) =1−|𝑤𝑖−𝑤𝑗|
𝑤𝑚𝑎𝑥 (3.2)
The smaller the difference in the eye size is, the greater the probability that the
two eyes are a valid pair.
Next, the pair of eyes is measured for proximity to the middle of the face
using Equation 3.3.
𝑃3(𝑖, 𝑗) =
ℎ𝑓𝑎𝑐𝑒
2−|
ℎ𝑓𝑎𝑐𝑒
2−𝑦𝑖|
ℎ𝑓𝑎𝑐𝑒
2
+
ℎ𝑓𝑎𝑐𝑒
2−|
ℎ𝑓𝑎𝑐𝑒
2−𝑦𝑗|
ℎ𝑓𝑎𝑐𝑒
2
, (3.3)
where ℎ𝑓𝑎𝑐𝑒 is the height of the detected face in pixels, and 𝑦𝑖 and 𝑦𝑗 are the y-
coordinates of the left and right eyes being compared. The closer to the middle of the
face the pair is, the higher the chances of it being a valid pair of eyes.
After that, Equation 3.4 looks at distance between the eyes and favors the set
that is furthest apart. 𝑥𝑖 and 𝑥𝑗 are the x-coordinates of the eyes, and 𝑤𝑓𝑎𝑐𝑒 is the width
of the face.
𝑃4(𝑖, 𝑗) =|𝑥𝑖−𝑥𝑗|
𝑤𝑓𝑎𝑐𝑒 (3.4)
The final condition in Equation 3.5 favors the pair of eyes that are closer
together vertically.
𝑃5(𝑖, 𝑗) =ℎ𝑓𝑎𝑐𝑒−|𝑦𝑖−𝑦𝑗|
ℎ𝑓𝑎𝑐𝑒 (3.5)
After all of these parameters are calculated, they can be summed together using
the likelihood function listed in Equation 3.6 and weighting factors 𝜑𝑛 to express the
features that are most important.
Texas Tech University, Alexander W. Clark, August 2015
21
𝑃𝑡𝑜𝑡(𝑖, 𝑗) = ∑ 𝜑𝑛𝑃𝑛(𝑖, 𝑗)5𝑛=1 (3.6)
Finally, using this likelihood function, the maximum indices can be found for
the pair of eyes that would yield the highest likelihood, as shown in Equation 3.7. The
variable 𝑚 is the total number of left eyes, and 𝑛 is the total number of right eyes.
𝑥𝑀𝐿 = max𝑖,𝑗 ∑ 𝑃𝑡𝑜𝑡(𝑥𝑖𝑗)0≤𝑖≤𝑚0<𝑗<𝑛
(3.7)
Now the program has the capability to reject eyes that were classified
incorrectly. Shown below in Figure 3.7 are two examples of images where a false
positive (shown in red) is successfully rejected and the correct pair of eyes validated
(shown in blue).
Figure 3.7: Eye False Positives Correctly Rejected
Classification Testing Results
In order to test that the classifiers and the valid feature selection code works,
the program was run on 705 images taken over a single day, every time a pig drank
from the water spigot. A screenshot of the console log during this test run can be seen
in Appendix A.
The classifiers performed well in rejecting the myriad of pictures inadequate
for facial geometry. Images such as the ones shown below in Figure 3.8 are rejected
for reasons such as the pig’s head being turned, eyelids being closed, an eye being
covered by an ear, or general blurriness.
Texas Tech University, Alexander W. Clark, August 2015
22
Figure 3.8: Examples of Rejected Images Due to Lack of Adequate Features
In the test of 705 images, 283 images were processed as valid, yielding a
rejection rate of 59.86%. While this number may seem high, it is important to
remember that many images can be taken and processed every second while the pig
drinks. Rejecting a high number of invalid images, like the ones shown above, is not a
problem when there are a multitude of images to choose from.
Furthermore, for this same set, there were only 5 misclassifications, meaning
that 5 features that were incorrectly classified. In all cases, it was a shadow or tear
stain misclassified as an eye. Even with these 5 misclassifications, that still yields a
misclassification rate of 0.71%, and the data obtained erroneously can easily be
discarded as an outlier, as discussed in Chapter 6.
Overall, the classification is very robust and surprisingly accurate. Captured
screenshots of the program positively classifying pig images can be found in
Appendix B.
Texas Tech University, Alexander W. Clark, August 2015
23
CHAPTER IV
FACIAL RECOGNITION
Detecting the features of the pig to calculate the weight is only useful if one
can assign the weight to a specific pig. There must be a methodology for sorting the
pig faces and assigning weights to the appropriate owner.
Uniform Transformation of Pig Facial Features
The Need for Common Feature Positions in Facial Recognition
The first step in facial recognition is to transform the face of the pig using its
facial geometries. Take for instance the four pictures below in Figure 4.1.
Figure 4.1: Four Different Pictures of the Same Pig
While these pictures may look similar to the human eye due to them all being
of the same pig, the computer has a harder time knowing that they are all the same
subject. Subtle changes in the heads’ rotation and angle create small variations in
feature locations, making it difficult for a computer to analyze. Therefore, it is
important that we map out all of the main features in a uniform fashion.
The chosen method for flattening the image out to a normalized coordinate
system, making it less susceptible to pig movement, is to map out a quadrilateral based
Texas Tech University, Alexander W. Clark, August 2015
24
on the eyes’ location and size difference as well as the nose’s location and size. An
example of the transformation we want to perform is shown below (Figure 4.2).
Figure 4.2: Perspective Transformation of the Pig Face
Perspective Quadrilateral Mapping
Before the corners of the quadrilateral can be positioned, there are a few
metrics to be computed first. The first ones that are calculated are the angle between
the eyes, as shown below in Equation 4.1 and the Euclidean distance between them, as
shown in Equation 4.2. After that, a metric, ∆eye, is designated to mark the difference
in widths of one eye versus the other (Equation 4.3).
θeye = tan−1 (yright_eye−yleft_eye
xright_eye−xleft_eye) (4.1)
𝑑𝑒𝑦𝑒 = √(𝑥𝑟𝑖𝑔ℎ𝑡_𝑒𝑦𝑒 − 𝑥𝑙𝑒𝑓𝑡_𝑒𝑦𝑒)2+ (𝑦𝑟𝑖𝑔ℎ𝑡_𝑒𝑦𝑒 − 𝑦𝑙𝑒𝑓𝑡_𝑒𝑦𝑒)
2 (4.2)
∆𝑒𝑦𝑒= |𝑤𝑙𝑒𝑓𝑡_𝑒𝑦𝑒 − 𝑤𝑟𝑖𝑔ℎ𝑡_𝑒𝑦𝑒| (4.3)
Texas Tech University, Alexander W. Clark, August 2015
25
After that, we can then calculate the bisecting point between the eyes. This
point represents the location on the picture that is directly between the two eyes, as
shown in Equation 4.4.
bisector = (xright_eye+xleft_eye
2,yright_eye+yleft_eye
2) (4.4)
This bisector point is useful in most of the nose calculations. Note the “T”
mark between the eyes and the nose in the Figure 4.2 above. The bisector point is the
center of this intersection.
The angle of the nose can now be calculated by using the bisector as an anchor
for the angle, like the equation shown in Equation 4.5, as well as the distance to the
nose (Equation 4.6).
𝜃𝑛𝑜𝑠𝑒 = tan−1 (𝑦𝑛𝑜𝑠𝑒−𝑦𝑏𝑖𝑠𝑒𝑐𝑡𝑜𝑟
𝑥𝑛𝑜𝑠𝑒−𝑥𝑏𝑖𝑠𝑒𝑐𝑡𝑜𝑟) (4.5)
𝑑𝑛𝑜𝑠𝑒 = √(𝑥𝑏𝑖𝑠𝑒𝑐𝑡𝑜𝑟 − 𝑥𝑛𝑜𝑠𝑒)2 + (𝑦𝑏𝑖𝑠𝑒𝑐𝑡𝑜𝑟 − 𝑦𝑛𝑜𝑠𝑒)2 (4.6)
The last metric that is useful in determining the transformation quadrilateral is
a point in the middle of the forehead determined by the angle of the nose as shown in
Equation 4.7.
𝑓𝑜𝑟𝑒ℎ𝑒𝑎𝑑 = (𝑥𝑏𝑖𝑠𝑒𝑐𝑡𝑜𝑟 − sin(𝜃𝑛𝑜𝑠𝑒) ∗ 𝑑𝑛𝑜𝑠𝑒 ∗
2
5,
𝑦𝑏𝑖𝑠𝑒𝑐𝑡𝑜𝑟 − cos( 𝜃𝑛𝑜𝑠𝑒)∗ 𝑑𝑛𝑜𝑠𝑒 ∗2
5
) (4.7)
Now that all of those metrics are calculated, it’s easy to mark the quadrilateral
to be transformed. The top-right and top-left points of the quadrilateral, marked by
Equations 4.8 and 4.9, respectively, are a function of both the forehead position in
addition to the distance, angle, and size difference between the eyes.
𝑇𝑅 = (𝑥𝑓𝑜𝑟𝑒ℎ𝑒𝑎𝑑 + 𝑐𝑜𝑠(𝜃𝑒𝑦𝑒) ∗ [𝑑𝑒𝑦𝑒 ∗
3
4+
∆𝑒𝑦𝑒
2] ,
𝑦𝑓𝑜𝑟𝑒ℎ𝑒𝑎𝑑 + 𝑠𝑖𝑛(𝜃𝑒𝑦𝑒) ∗ [𝑑𝑒𝑦𝑒 ∗3
4+
∆𝑒𝑦𝑒
2]) (4.8)
𝑇𝐿 = (𝑥𝑓𝑜𝑟𝑒ℎ𝑒𝑎𝑑 − 𝑐𝑜𝑠(𝜃𝑒𝑦𝑒) ∗ [𝑑𝑒𝑦𝑒 ∗
3
4−
∆𝑒𝑦𝑒
2] ,
𝑦𝑓𝑜𝑟𝑒ℎ𝑒𝑎𝑑 − 𝑠𝑖𝑛(𝜃𝑒𝑦𝑒) [𝑑𝑒𝑦𝑒 ∗3
4−
∆𝑒𝑦𝑒
2]
) (4.9)
Texas Tech University, Alexander W. Clark, August 2015
26
The bottom-left and bottom-right points of the quadrilateral, marked by
Equations 4.10 and 4.11, respectively, are functions of the size, angle, and position of
the nose.
𝐵𝐿 = (𝑥𝑛𝑜𝑠𝑒 − cos(𝜃𝑛𝑜𝑠𝑒) ∗ 𝑤𝑛𝑜𝑠𝑒 ∗
3
8,
𝑦𝑛𝑜𝑠𝑒 + sin (𝜃𝑛𝑜𝑠𝑒) ∗𝑤𝑛𝑜𝑠𝑒
2
) (4.10)
𝐵𝑅 = (𝑥𝑛𝑜𝑠𝑒 + cos(𝜃𝑛𝑜𝑠𝑒) ∗ 𝑤𝑛𝑜𝑠𝑒 ∗
3
8,
𝑦𝑛𝑜𝑠𝑒 − sin (𝜃𝑛𝑜𝑠𝑒) ∗𝑤𝑛𝑜𝑠𝑒
2
) (4.11)
Finally, we can use these four points in a perspective transformation to flatten
the image into a rectangular of any size using the perspective transformation technique
outlined in the next two sections.
Perspective Transformation of Pixel Position
In the last section, four points of the perspective quadrilateral were found
based on the eye and nose locations and size. If these four points are known as well as
the size of the desired rectangle, it is easy to find a transformation matrix representing
the relationship between the two, as modeled in Figure 4.3 below.
Figure 4.3: Perspective Transformation of a Warped Quadrilateral
In this image, the coordinates (𝑥𝑖, 𝑦𝑖) represent the four corners of the warped
quadrilateral. In this application, these would be the four points found in the previous
Texas Tech University, Alexander W. Clark, August 2015
27
section. The coordinates (𝑥𝑖′, 𝑦𝑖
′) represent the four corners of the spatially
normalized image.
Finally, the relationship between the two is shown in Equation 4.12, an
equation obtained from OpenCV’s documentation on geometric transformations
[“Geometric Transformations”].
[𝑡𝑖𝑥𝑖
′
𝑡𝑖𝑦𝑖′
𝑡𝑖
] = 𝑀 ∙ [𝑥𝑖
𝑦𝑖
1] (4.12)
𝑀 is a 3x3 transformation matrix, and 𝑡𝑖 is the scaling factor of the new
rectangular. The new rectangular can be any size, but for this application, it was
chosen to be 300 pixels wide and 600 pixels high.
Now that the transformation matrix has been found, it is easy to find the
coordinate points on the new rectangular that correspond to coordinates on the warped
quadrilateral. Any points on the rectangular, found by iterating across the sides and
height of the rectangular, are found with Equation 4.13.
𝑑𝑠𝑡(𝑥, 𝑦) = 𝑠𝑟𝑐 (𝑀11𝑥+𝑀12𝑦+𝑀13
𝑀31𝑥+𝑀32𝑦+𝑀33,𝑀21𝑥+𝑀22𝑦+𝑀23
𝑀31𝑥+𝑀32𝑦+𝑀33) (4.13)
In this equation, 𝑑𝑠𝑡 represents the new rectangle, and 𝑠𝑟𝑐 represents the
warped quadrilateral.
Bicubic Pixel Interpolation
The section above covers calculation of the coordinates on the warped
quadrilateral but does not specify how to determine the pixel value. The actual value
of the pixel in the new rectangle is determined by bicubic pixel interpolation. For
instance, in Figure 4.4, points 𝑝(0,0), 𝑝(0,1), 𝑝(1,0), and 𝑝(1,1) on the image all
have known pixel values. The value at the new coordinate, the 𝑛𝑒𝑤 𝑝(𝑥, 𝑦), must be
determined using these four neighbors.
Texas Tech University, Alexander W. Clark, August 2015
28
Figure 4.4: The Basic Pixel Interpolation Model
One of the easier and more commonly used interpolation algorithms is the
bilinear transformation which proportions the resulting value of the new pixel with the
relative distance between the four neighboring pixel points to the coordinate. Bicubic
transformation takes a step further by attempting to recreate the surface between the
four points. Bilinear just needs to know the position of the four pixels and their values.
Bicubic, however, needs to know:
The values of the pixels.
The partial derivative with respect to x of the slopes of those values.
The partial derivative with respect to y of the slopes of those values.
The x-y cross product of the slopes of those values.
It is also worth mentioning that while the four neighboring pixels are the most
important for the interior of the surface, where the new point will be, a full 16 points
surrounding the new coordinate will be necessary to calculate all of the information
needed (Figure 4.5).
Texas Tech University, Alexander W. Clark, August 2015
29
Figure 4.5: Sixteen Neighbors Used for a Bicubic Interpolation
With that information, one can form a bicubic equation that outputs the pixel
value at the given coordinates, such as the one shown below in Equation 4.14
[Lancaster].
𝑝(𝑥, 𝑦) = 𝑎00𝑥0𝑦0 + 𝑎01𝑥
0𝑦1 +
𝑎02𝑥0𝑦2 + 𝑎03𝑥
0𝑦3 +
𝑎10𝑥1𝑦0 + 𝑎11𝑥
1𝑦1 +
𝑎12𝑥1𝑦2 + 𝑎13𝑥
1𝑦3 +
𝑎20𝑥2𝑦0 + 𝑎21𝑥
2𝑦1 +
𝑎22𝑥2𝑦2 + 𝑎23𝑥
2𝑦3 +
𝑎30𝑥3𝑦0 + 𝑎31𝑥
3𝑦1 +
𝑎32𝑥3𝑦2 + 𝑎33𝑥
3𝑦3 (4.14)
In order for this equation to work though, the coefficients 𝑎00 through 𝑎33
must be solved. To begin doing this, all four different pieces of information required
for each of the four points must be calculated in terms of the bicubic equation.
First, the values of the pixels are determined in Equations 4.15 – 4.18
[Lancaster].
𝑤0 = 𝑝(0,0) = 𝑎00 (4.15)
𝑤1 = 𝑝(1,0) = 𝑎00 + 𝑎10 + 𝑎20 + 𝑎30 (4.16)
𝑤2 = 𝑝(0,1) = 𝑎00 + 𝑎01 + 𝑎02 + 𝑎03 (4.17)
𝑤3 = 𝑝(1,1) = 𝑎00 + 𝑎10 + 𝑎20 + 𝑎30 + 𝑎01 + 𝑎11 + 𝑎21 + 𝑎31 + 𝑎02 +𝑎12 + 𝑎22 + 𝑎32 + 𝑎03 + 𝑎13 + 𝑎23 + 𝑎33 (4.18)
Texas Tech University, Alexander W. Clark, August 2015
30
Then the partial derivative with respect to x of the slopes of the values is
determined in Equations 4.19 – 4.22 [Lancaster].
𝑥0 =𝜕
𝜕𝑥𝑝(0,0) = 𝑎10 (4.19)
𝑥1 =𝜕
𝜕𝑥𝑝(1,0) = 𝑎10 + 2𝑎20 + 3𝑎30 (4.20)
𝑥2 =𝜕
𝜕𝑥𝑝(0,1) = 𝑎10 + 𝑎11 + 𝑎12 + 𝑎13 (4.21)
𝑥3 =𝜕
𝜕𝑥𝑝(1,1) = 1(𝑎10 + 𝑎11 + 𝑎12 + 𝑎13) + 2(𝑎20 + 𝑎21 + 𝑎22 + 𝑎23) +
3(𝑎30 + 𝑎31 + 𝑎32 + 𝑎33) (4.22)
Following that, the partial derivative with respect to the y of the slopes of the
values is shown in Equations 4.23-4.26 [Lancaster].
𝑦0 =𝜕
𝜕𝑦𝑝(0,0) = 𝑎01 (4.23)
𝑦1 =𝜕
𝜕𝑦𝑝(1,0) = 𝑎01 + 𝑎11 + 𝑎21 + 𝑎31 (4.24)
𝑦2 =𝜕
𝜕𝑦𝑝(0,1) = 𝑎01 + 2𝑎02 + 3𝑎03 (4.25)
𝑦3 =𝜕
𝜕𝑦𝑝(1,1) = 1(𝑎01 + 𝑎11 + 𝑎21 + 𝑎31) + 2(𝑎02 + 𝑎12 + 𝑎22 + 𝑎32) +
3(𝑎03 + 𝑎13 + 𝑎23 + 𝑎33) (4.26)
Next, the cross product of the slopes for all four points’ values is calculated as
shown in Equations 4.27 – 4.30 [Lancaster].
𝑧0 = 𝑥 × 𝑦 𝑜𝑓 𝑝(0,0) = 𝑎11 (4.27)
𝑧1 = 𝑥 × 𝑦 𝑜𝑓 𝑝(1,0) = 𝑎11 + 2𝑎21 + 3𝑎31 (4.28)
𝑧2 = 𝑥 × 𝑦 𝑜𝑓 𝑝(0,1) = 𝑎11 + 2𝑎12 + 3𝑎13 (4.29)
𝑧3 = 𝑥 × 𝑦 𝑜𝑓 𝑝(1,1) = 1𝑎11 + 2𝑎12 + 3𝑎13 + 2𝑎21 + 4𝑎22 + 6𝑎23 +3𝑎31 + 6𝑎32 + 9𝑎33 (4.30)
Finally, now that all of those equations can be evaluated, linear algebra is used
to solve for all of the coefficients 𝑎00 through 𝑎33, as shown through Equations 4.31 –
4.46 [Lancaster].
𝑎00 = 𝑤0 (4.31)
𝑎01 = 𝑦0 (4.32)
𝑎02 = −3𝑤0 + 3𝑤2 − 2𝑦0 − 𝑦2 (4.33)
𝑎03 = 2𝑤0 − 2𝑤2 + 𝑦0 + 𝑦2 (4.34)
Texas Tech University, Alexander W. Clark, August 2015
31
𝑎10 = 𝑥0 (4.35)
𝑎11 = 𝑧0 (4.36)
𝑎12 = −3𝑥0 + 3𝑥2 − 2𝑧0 − 𝑧2 (4.37)
𝑎13 = 2𝑥0 − 2𝑥2 + 𝑧0 + 𝑧2 (4.38)
𝑎20 = −3𝑤0 + 3𝑤1 − 2𝑥0 − 𝑥1 (4.39) 𝑎21 = −3𝑦0 + 3𝑦1 − 2𝑧0 − 𝑧1 (4.40)
𝑎22 = 9𝑤0 − 9𝑤1 − 9𝑤2 + 9𝑤3 + 6𝑥0 + 3𝑥1 − 6𝑥2 − 3𝑥3 + 6𝑦0 − 6𝑦1 +3𝑦2 − 3𝑦3 + 4𝑧0 + 2𝑧1 + 2𝑧2 + 𝑧3 (4.41)
𝑎23 = −6𝑤0 + 6𝑤1 + 6𝑤2 − 6𝑤3 − 4𝑥0 − 2𝑥1 + 4𝑥2 + 2𝑥3 − 3𝑦0 + 3𝑦1 −3𝑦2 + 3𝑦3 − 2𝑧0 − 𝑧1 − 2𝑧2 − 𝑧3 (4.42)
𝑎30 = 2𝑤0 − 2𝑤1 + 𝑥0 + 𝑥1 (4.43) 𝑎31 = 2𝑦0 − 2𝑦1 + 𝑧0 + 𝑧1 (4.44) 𝑎32 = −6𝑤0 + 6𝑤1 + 6𝑤2 − 6𝑤3 − 3𝑥0 − 3𝑥1 + 3𝑥2 + 3𝑥3 − 4𝑦0 + 4𝑦1 −
2𝑦2 + 2𝑦3 − 2𝑧0 − 2𝑧1 − 𝑧2 − 𝑧3 (4.45) 𝑎33 = 4𝑤0 − 4𝑤1 − 4𝑤2 + 4𝑤3 + 2𝑥0 + 2𝑥1 − 2𝑥2 − 2𝑥3 + 2𝑦0 − 2𝑦1 +
2𝑦2 − 2𝑦3 + 𝑧0 + 𝑧1 + 𝑧2 + 𝑧3 (4.46)
Now that all of the coefficients have values, it is possible to use the bicubic
equation described in Equation 4.14 and determine the pixel values of the flattened
image using bicubic pixel interpolation.
Local Binary Patterns
Features of Local Binary Patterns
Following the perspective transformation and bicubic interpolation, the next
step is to implement the facial recognition. This software uses local binary patterns
(LBP). The local binary patterns are useful in this design because they do not
necessarily require a supervised sample training set of images in order to classify.
Eigenfaces and Fisherfaces, two other popular forms of facial detection, typically
require 9-10 valid samples of a face before they can recognize it in unknown images.
Unfortunately, due to the environment of a pig pen, our application does not have the
luxury of being able to take supervised images of the pigs in the pen every time the
software needs to be used. Instead, an unsupervised facial recognition technique is
used to circumvent this issue.
The image below in Figure 4.6 demonstrates how a local binary pattern is
formed.
Texas Tech University, Alexander W. Clark, August 2015
32
Figure 4.6: A 3x3 Pixel LBP Example
The center pixel of this 3x3 pixel section of an image is tested against its
neighbors by value. The value of the center pixel becomes the threshold, and the
neighboring pixel is labeled with a 1 if its value is greater or equal to the center pixel’s
value. It is labeled as a 0 otherwise. The binary value formed by these 1’s and 0’s
represent the unique pattern or feature that the pixel creates with its neighbors.
This same concept can be applied in a more scalable form with Extended (or
Circular) LBP so that the neighborhood is variable by a radius around the center pixel,
rather than just the immediate neighboring pixels. The result can then designate a
variety of features, like those shown in Figure 4.7.
Figure 4.7: LBP Feature Examples [Wagner]
Local Binary Images
After each pixel has been assigned a binary value, a local binary image is
formed from all of these components. Then, if the local binary image is divided up
into windows of an equidistant grid like the picture below in Figure 4.8, one can
actually assign a histogram to all of the binary values found in each window.
Texas Tech University, Alexander W. Clark, August 2015
33
Figure 4.8: Local Binary Pattern Image and Histogram Concatenation
Finally, the histograms are combined by concatenating them all together, rather
than merging, to maintain all of the spatial information of the features. This histogram
is unique to the face, yet will be similar to other histograms of images taken of the
same pig. In order to compare any picture against a sample to determine similarity,
one calculates the local binary pattern histogram of the new image and compares it to
the sample one.
Histogram Comparison
The comparison performed by OpenCV is implemented through a correlation
equation shown below, where 𝑁 is the total number of histogram bins (Equation 4.47).
𝑑(𝐻1, 𝐻2) =∑ (𝐻1(𝐼)−𝐻1̅̅ ̅̅ )(𝐻2(𝐼)−𝐻2̅̅ ̅̅ )𝑁
𝐼=0
√∑ (𝐻1(𝐼)−𝐻1̅̅ ̅̅ )2𝑁𝐼=0 ∑ (𝐻2(𝐼)−𝐻2̅̅ ̅̅ )2𝑁
𝐼=0
(4.47)
The variable Hk̅̅̅̅ in the equation is described in Equation 4.48.
𝐻𝑘̅̅̅̅ =
1
𝑁∑ 𝐻𝑘(𝐽)
𝑁𝐽=0 (4.48)
Texas Tech University, Alexander W. Clark, August 2015
34
The metric 𝑑(𝐻1, 𝐻2) measures the similarity between the two histograms.
Unsupervised Data Clustering
Finally, we use these LBP facial recognition techniques to cluster the images
by similarity. Before the unsupervised technique is explained, the methodology of
supervised data clustering must first be covered. If we had sample images of all of the
pigs in a pen, then we could use those images to cluster any unknown images by
assigning labels appropriately, given similarity of an unknown image’s local binary
image to the local binary images of the known samples. The process would follow the
outline below in Figure 4.9.
Figure 4.9: Supervised LBP Facial Recognition Flowchart
For any given image, one could compute the local binary image and assign it a
label from whichever sample image it most closely resembles in terms of the
histogram. With that label, one would also have a confidence value that is directly
correlated to the similarity of the images. If a confidence value is high, the recognizer
is more certain the two images represent the same pig. If the confidence value is low,
the chances are less likely.
The unsupervised mode follows a similar flow chart but doesn’t have the initial
sample data at the beginning that the supervised technique does. In its stead, it will
actually dynamically add to the sample training set whenever it finds an image that
Texas Tech University, Alexander W. Clark, August 2015
35
falls below the confidence value of any image in the currently trained sample set, as
shown below in Figure 4.10.
Figure 4.10: Unsupervised LBP Facial Recognition Flowchart
The process begins with the very first image of a pig face. We already know
this is a pig face and know that it does not match any other faces yet, because it is in
fact the first face. Thus begins our sample set. Subsequent pictures that are put through
the recognizer follow one of two actions. They either match an image in the sample set
with a confidence value over the specified threshold, or they do not match any existing
image in the sample set because their associated confidence value is too low. This
threshold value then becomes the basis of the clustering technique. If a face is
matched, then we assign to the image the label of that cluster. If the face does not have
a match, we add it to the sample set with a new label and retrain the recognizer to
include that picture as a new face to be matched against, starting a new cluster. The
process repeats for any pig face to be labeled. An example of the program processing
this information can be found in Appendix E. In the end, all of the faces have been
gathered into clusters of similarity to be further processed.
Texas Tech University, Alexander W. Clark, August 2015
36
Facial Recognition Conclusion
The advantage of finding all of the facial features using the cascade classifiers
is that we can flatten the image out to a normalized coordinate system for facial
recognition. The program successfully uses perspective transformation and bicubic
pixel interpolation to create a spatially normalized image of the pig’s face. After this
normalization, the local binary patterns of the image are compared to other images and
grouped into clusters of based on similarity. These clusters will be vital in determining
the weight of the pig represented by each cluster.
Texas Tech University, Alexander W. Clark, August 2015
37
CHAPTER V
REGRESSION
Examination of Features
The Feature Vector Sets
In order to find a mathematical relationship between the features detected in
the picture and the overall weight of the pig, all of the features shown below in Figure
5.1 were analyzed and passed through multiple kinds of regression.
Figure 5.1: Pig Face Feature Vectors
In total, the program outputs 16 different features associated with the pig face
in the picture, all measured in pixels. The program has a special mode for training the
Texas Tech University, Alexander W. Clark, August 2015
38
regressive data that allows the user to check each image to confirm that it was
classified correctly before outputting the data on the feature. Examples of this working
are shown in console screenshots in Appendices C and D. After that, all of the features
are stored in a file for regressive analysis.
The goal of the regression work is to find a mathematical relationship between
the 16 features and the known weight of the pig in the digital image (measured and
recorded the same day as the pictures being used). A weight vector representing the
coefficients of an equation formed by the features creates a mathematical formula for
calculating the weight of a pig.
Many different methods, not all of which will be discussed in full detail, were
run on the feature vectors output from a test run of approximately 200 different valid
images. It is unwise to use all of the feature vectors to calculate the weight vector,
since that leaves no means to evaluate the accuracy of the regression. Therefore, of
these 200 images and associated feature vectors, the data was divided into two sets:
70% for training and 30% for testing. Whenever a regression technique was run, the
best iteration of it was selected using the results of the testing set.
Methods Attempted
A variety of regression techniques were examined on the data including, but
not limited to, ridge, lasso, and elastic net regression. All three generally make the
assumption that the predictors are independent variables. While elastic net was a
related technique designed for several highly correlated variables, it still performed
poorly (discussed in next section). Even lasso, generally a strong regularization
technique, failed to identify the most important predictors in the data set. All three of
these techniques were implemented successfully in MATLAB with synthetically
generated data but still failed to give desirable results with the real data from the pig
features.
Texas Tech University, Alexander W. Clark, August 2015
39
Undesirable Results
All unsuccessful regression techniques tested had either of two main results.
The first common trend in an unsuccessful fitting was a very flat line close to the x-
axis, such as the example shown below in Figure 5.2.
Figure 5.2: Example of Unsuccessful Regression with High Bias and Low Variance
When the data produces this type of results, it can be assumed that the
predictors chosen do not actually correlate with the final function value. More
specifically, a trend like this implies that the features used do not actually form a
linear combination to output the weight of the pig. To compensate for this, the script
finds the minimal error in the set by flattening out the line with low variance but a
very high bias.
The next unsuccessful case often seen is the overfitting of the training data. In
these cases, it appears that the training data has yielded a successful regression from
0 20 40 60 80 100 120 140 160 180 20060
80
100
120
140
160
180
200
220
240
Sample Number
Weig
ht
Training Sample Regression Results
Actual Weight
Calculated Weight
Texas Tech University, Alexander W. Clark, August 2015
40
the marginal error between the known weight of the pig and the calculated weight,
such as the graph shown below in Figure 5.3.
Figure 5.3: Example of Overfitting the Training Set
While this sort of result may initially look good, there is danger in this result.
The very nature of overfitting is that the training set is overly matched, producing an
erroneous final equation because it overcompensated for noise by favoring the training
data too closely. If such a case occurs, then the training data will appear exceptionally
well-fitted, but the test data will suffer. For instance, the graph below in Figure 5.4 is
the output of the testing samples used by the equation formed with the training set in
Figure 5.3.
0 20 40 60 80 100 120160
170
180
190
200
210
220
230
240
Sample Number
Weig
ht
Training Sample Regression Results
Actual Weight
Calculated Weight
Texas Tech University, Alexander W. Clark, August 2015
41
Figure 5.4: High Variance of a Testing Test after Overfitting the Training Set
As the graph indicates, the testing results then have extremely high variance
that produce unrealistic weight calculations.
Least Squares Method with Interdependent Predictors
Desirable Results
The most successful methodology found was a form of the least squares
regression method mixed with making a few of the features strongly dependent on
each other. Before the specifics on this technique are discussed, the results will be
shown and discussed first so that it is clear what makes this technique desirable.
The goal of the training regression results is to form a calculated weight line
that trends with the actual weight lines, without being influenced too much by the
noise of the data. A good example of this would be the result shown below in Figure
5.5.
0 10 20 30 40 50 600
100
200
300
400
500
600
Sample Number
Weig
ht
Test Sample Regression Results
Actual Weight
Calculated Weight
Texas Tech University, Alexander W. Clark, August 2015
42
Figure 5.5: Regression Results of Features for Training Set
It can clearly be seen that the data trends properly with the training set. More
importantly, when the same equation formed by the training set above is applied to the
testing data, the results there also trend (Figure 5.6).
0 20 40 60 80 100 120 140150
160
170
180
190
200
210
220
230
240
250
Sample Number
Weig
ht
Training Sample Regression Results
Actual Weight
Calculated Weight
Texas Tech University, Alexander W. Clark, August 2015
43
Figure 5.6: Regression Results of Features for Testing Set
Averaging Pig Clusters
Obviously, the regression does not produce exact results; the training set error
suggests that there is still some variance. However, if the different weights calculated
for each pig are averaged together, the resulting averaged weights are reasonably close
to the actual weights. It is practical to use the average weight since the pictures have
already been clustered together during the facial recognition phase.
The accuracy of the averaged weights is shown in Figures 5.7 and 5.8 for the
training and testing sets, respectively.
0 10 20 30 40 50 60140
160
180
200
220
240
260
Sample Number
Weig
ht
Test Sample Regression Results
Actual Weight
Calculated Weight
Texas Tech University, Alexander W. Clark, August 2015
44
Figure 5.7: Averaging Results of Training Set
Figure 5.8: Averaging Results of Testing Set
1 2 3 4 5 6160
170
180
190
200
210
220
230
240
Sample Number
Weig
ht
Training Set
Average Calculated Weight
Actual Weight
1 2 3 4 5160
170
180
190
200
210
220
230
240
Sample Number
Weig
ht
Testing Set
Average Calculated Weight
Actual Weight
Texas Tech University, Alexander W. Clark, August 2015
45
Predictor Creation
A typical application of the least squares for regression would use all of the
features, or predictors, only once but at different polynomial degrees. For instance,
one could develop an equation with 16 variables (features/predictors), 16
corresponding coefficients, and a constant. Another choice is one with 32 variables, 32
coefficients, and a constant, where the additional 16 variables and coefficients just
come from the previous 16 features squared. This trend could continue for equations
of multiple orders. Notice that in this methodology, features are completely
independent of each other, an assumed trait to most regression techniques.
In the case of the final equation used, a similar concept is applied, but we will
use a set of new feature vectors formed by concatenating three vectors. The first vector
is simply the original feature vector, all predictors to the first power. The second
vector is the first set multiplied by all combinations of the first set. This means the
second set will contain single predictors to the second degree and products of all
combinations of the individual, original predictors. The third set repeats this operation
but on all combinations of multiplying the first set and the second set. That means
some new predictors will be original predictors to the third degree, some will be an
original predictor times another original predictor to the second degree, or some will
be three different predictors multiplied together.
More specifically, if we were going to look at the final equation for the original
feature set �̂� = {𝑥1, 𝑥2, … , 𝑥𝑛} and the weight vector, or coefficients, �̂� =
{𝑤0, 𝑤1, 𝑤2, … , 𝑤1+2𝑛+2𝑛2+𝑛3}, we use Equation 5.1.
𝑓(�̂�) =𝑤0 + ∑ 𝑤𝑖𝑥𝑖1≤𝑖≤𝑛 + ∑ 𝑤(1+𝑛+𝑛𝑖+𝑗)𝑥𝑖𝑥𝑗1≤𝑖≤𝑛
1≤𝑗≤𝑛+ ∑ 𝑤(1+𝑛+𝑛2+𝑛2𝑖+𝑛𝑗+𝑘)𝑥𝑖𝑥𝑗𝑥𝑘1≤𝑖≤𝑛
1≤𝑗≤𝑛1≤𝑘≤𝑛
)
(5.1)
Note that the weight vector is simply the coefficients multiplied times the new
feature vector formed plus the constant w0.
Texas Tech University, Alexander W. Clark, August 2015
46
Primary Features Used
Finally, now that the creation of the large, new feature vector is explained, it is
important to state which features were actually used in the original feature vector
before it is expanded.
As it turns out, the regression returned the least error with a reduced feature
set. The only features on the pig geometry being used are the difference between the
eye sizes, the average eye size, the Euclidean distance between the centers of the eyes,
and the coordinate position of the nose.
Least Squares Methodology
After showing the successful results and explaining which features were
actually used, the specifics on the regression form chosen will be explained.
Henceforth, �̂� will be reassigned to express the new set of predictors so that
�̂� = {𝑥1, 𝑥2, … , 𝑥𝑁},
where N represents the length of the new vector determined by
1 + 2𝑛 + 2𝑛2 + 𝑛3
with n being the number of original features (difference in eye sizes, average eye size,
Euclidean distance, etc.).
The weight vector, or coefficients of the final equation, is still referred to as
�̂� = {𝑤0, 𝑤1, 𝑤2, … , 𝑤𝑁}. The least squares method is used to minimize the following
cost function shown in Equation 5.2.
𝑚𝑖𝑛�̂� 𝐽(�̂�) = ∑ (𝑦𝑖 − �̂� 𝑥�̂�𝑇)2𝑀𝑡𝑟𝑎𝑖𝑛
𝑖=1 (5.2)
The variable �⃗⃗� is the weight vector, 𝑥�̂� is the feature vector of the ith image out
of 𝑀𝑡𝑟𝑎𝑖𝑛 images in the training set, �̂� 𝑥�̂�𝑇 is the estimate of the weight, and yi is the
actual weight of the pig in the ith image. This equation can be rewritten with the
variables
𝑋 = [𝑥1̂; 𝑥2̂; … ; 𝑥�̂�]
Texas Tech University, Alexander W. Clark, August 2015
47
and with
𝑌 = [𝑦1; 𝑦2; … ; 𝑦𝑁]
as the minimization function shown below in Equation 5.3.
min�̂� 𝐽(�̂�) = ‖𝑋 ∙ �̂� − 𝑌‖2. (5.3)
To find the minimum of the cost function, the gradient is set equal to zero,
which gives the following function in Equation 5.4.
𝛻𝐽(�̂�) = 2�̂�𝑇𝑋𝑇𝑋 − 2𝑌𝑇𝑋 = 0. (5.4)
Solving for �⃗⃗� , one finds the following solution in Equation 5.5:
�̂� = (𝑋𝑇𝑋)−1𝑋𝑇𝑌. (5.5)
The term (𝑋𝑇𝑋)−1𝑋𝑇 is known as the pseudoinverse of X.
Then, weight estimates for the test set can be calculated and compared to the
actual weights by computing the test set error. Ideally, this error is representative of
the out-of-sample error, which demonstrates the general accuracy of the weight
equation in practice.
It was observed that variations in the accuracy of the calculated weights
existed, based on the various assignments of feature vectors to the training and testing
sets. Thus, the regression was performed multiple times, with the feature vectors
randomly reassigned to each set upon every iteration. The final weight vector used in
the program was the one associated with the sets of smallest test error, found using
Equation 5.6.
∑ (𝑦𝑖−�̂�∙𝑋)2𝑀𝑡𝑒𝑠𝑡𝑖=0
𝑀𝑡𝑒𝑠𝑡 (5.6)
𝑀𝑡𝑒𝑠𝑡 is simply the size of the features used in the test set. As explained in the section
on overfitting, it is necessary to measure the success of the regression by the results of
the testing set, not the training set.
Texas Tech University, Alexander W. Clark, August 2015
48
Regression Conclusion
As long as the face, eyes, and nose of the pig in a picture can be classified
correctly, then it is feasible to use those features to formulate a mathematical
prediction of that pig’s weight. As discussed in this chapter, the least squares
regression with the expanded interdependent feature vector will return the coefficients
to the necessary equation, as long as training data is provided. The training data for
every picture classified needs to include the average eye size, the difference in eye
size, the Euclidean two-dimensional distance between the eyes, the coordinate position
of the nose, and the pig’s known weight. With that data and the regression MATLAB
code (Appendix H), any fixed camera position can be trained to return an estimation of
a pig’s weight from a captured image.
Texas Tech University, Alexander W. Clark, August 2015
49
CHAPTER VI
CLUSTER ADJUSTMENTS
As with any system, we may need to compensate for error. The system is built
to handle four cases for error. To begin, two forms of outlier detectors are
implemented. One is used to discard pictures with misclassified features, while the
other is used to discard numerical data that could skew the distribution of estimated
weights in a cluster. There is also a limitation placed on the orientation of a pig face
for it to be considered for facial recognition. Finally, there is regrouping of clusters
that represent the same pig but have been split apart by the unsupervised clustering
method.
Outlier Detection
Cluster Minimization
The first step in getting rid of misclassified images is to require a minimum
number of pictures in a cluster for it to be valid. For instance, in the example below
shown in Figure 6.1, it can be seen that the eye on the left side of the picture has been
misclassified.
Texas Tech University, Alexander W. Clark, August 2015
50
Figure 6.1: Example of Misclassification Error in the Pig Image
A shadow in the pig’s face has been identified as the pig’s eye. While normally
a misclassification of this kind could be detrimental to the overall predicted weight of
the pig, erroneous results are minimized by discarding any clusters that do not meet a
certain value. After running hundreds of iterations of the program with different
parameters, the final minimal cluster count chosen was five. This means that for this
image to be marked as part of a valid cluster and its weight estimation to be recorded,
the unsupervised facial clustering has to find four other images with similar local
binary patterns. The chance of generating a misclassification that produces the same
incorrect transformation of an image five times, to the extent the program identifies all
five images as the same face, is extremely slim. Indeed, in all of the tests run during
the course of this project, such a case was never seen.
As can be seen in the console log of the program below in Figure 6.2, smaller
clusters have just been omitted from the output of the program.
Texas Tech University, Alexander W. Clark, August 2015
51
Figure 6.2: Example of Cluster 6 Being Discarded Due to Insufficient Size
Nose Angle Limitation
There are many cases where pictures of the same pig, normally grouped
together in one cluster, will fracture into separate clusters. While a special regrouping
method is implemented and discussed later in the chapter, there is one limitation
placed on valid images to prevent this from happening.
After all of the facial features are detected, the difference in the angle of the
eyes and the angle of the nose to the bisector between the eyes is checked. Normally, a
perpendicular angle, such as the one in Figure 6.3 below, is desired to ensure proper
facial recognition.
Texas Tech University, Alexander W. Clark, August 2015
52
Figure 6.3: Picture of Pig Face Exhibiting Desirable Eye and Nose Angles
A face at such an angle is facing the camera and squared away enough that the
transformed image is easily computed by the unsupervised local binary patterns and
grouped with images from the same pig.
However, if the angle of the eyes and the nose are too far away from this
perpendicular relationship, then some features of the pig face will be lost because of
its orientation and rotation. Figure 6.4 is an example of such an image.
Texas Tech University, Alexander W. Clark, August 2015
53
Figure 6.4: Picture of Pig Exhibiting Undesirable Eye and Nose Angles
The dark space next to the pig on the left that is the background can
erroneously be interpreted as part of the pig’s face, leading to this picture being
classified as an entirely different pig than the other pictures it should be clustered with.
The solution to this problem is as simple as limiting that only images with
angles below a certain threshold be analyzed and calculated as part of a pig’s weight.
The threshold was chosen to filter images as indicated by Equation 6.1.
𝑝ℎ𝑜𝑡𝑜 𝑖𝑠 {"𝑣𝑎𝑙𝑖𝑑", |𝜃𝑒𝑦𝑒 − 𝜃𝑛𝑜𝑠𝑒| < 0.04 𝑟𝑎𝑑𝑖𝑎𝑛𝑠
"𝑖𝑛𝑣𝑎𝑙𝑖𝑑", 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (6.1)
Grubb’s T-test
The second threat to the accuracy of the weight estimation is the possibility of
statistical outliers. The greatest chance of causing this would be the misidentification
of one pig as another. For instance, a case of this occurring is shown in a very early
run of the program, as seen below in Figure 6.5.
Texas Tech University, Alexander W. Clark, August 2015
54
Figure 6.5: Example of Pig Misidentification
In this example, the known ID’s of the pigs (i.e. “14”,”15”,”17”, etc.) are being
displayed underneath the cluster name to show whether or not a cluster was filled with
images of the same pig. Here, Cluster 8 has grouped to form a cluster with four
pictures of Pig 17 but failed by including Pig 15 as well. If the weight between these
two pigs varies greatly, then this one outlier could consequently also skew the total
calculated weight for Pig 17.
In order to detect and discard the outliers, Grubb’s T-test is implemented into
the program. This outlier test was chosen due to it being designed for cases of multiple
possible outliers, rather than just one, since there is no way of knowing the number of
misidentified data points that may exist.
To begin, the mean of the clustered data set is calculated using Equation 6.2.
�̅� = ∑𝑥𝑖
𝑁
𝑁𝑖=1 (6.2)
Next, the standard deviation of the set is calculated (Equation 6.3).
𝑠 = √∑ (𝑥𝑖−�̅�)2𝑁𝑖=1
𝑁−1 (6.3)
After those two metrics are computed, we must be able to select the data point
that is farthest from the mean and is most likely to be an outlier. This data point is
going to be the one that fits the following parameter, or has the maximum distance
from the mean (Equation 6.4).
Texas Tech University, Alexander W. Clark, August 2015
55
|𝑥𝑖 − �̅�|𝑚𝑎𝑥 (6.4)
Next, the T metric is computed using Equation 6.5.
𝑇 =|𝑥𝑖−�̅�|
𝑠 (6.5)
This metric is used to determine if the data point is an outlier or not. The value
obtained is compared to a table of Tcrit values, like the one shown below in Table 6.1.
Table 6.1: Tcrit Values for Grubbs’ T-test for Outliers [“Outlier”]
Data Points Risk of false rejection (%)
N 0.1 0.5 1 5 10
3 1.155 1.155 1.155 1.153 1.148
4 1.496 1.496 1.492 1.463 1.425
5 1.780 1.764 1.749 1.672 1.602
6 2.011 1.973 1.944 1.822 1.729
7 2.201 2.139 2.097 1.938 1.828
8 2.358 2.274 2.221 2.032 1.909
9 2.492 2.387 2.323 2.110 1.977
10 2.606 2.482 2.410 2.176 2.036
15 2.997 2.806 2.705 2.409 2.247
20 3.230 3.001 2.884 2.557 2.385
25 3.389 3.135 3.009 2.663 2.486
50 3.789 3.483 3.336 2.956 2.768
100 4.084 3.754 3.600 3.207 3.017
If the T value is greater than or equal to Tcrit, it is an outlier and can be
discarded from the data set. If the T value is less than the Tcrit value, then it is not an
outlier and can be kept as part of the data set.
Finally, it is worth noting that this case of an outlier is rather rare. The
parameters of the facial recognition were maximized through running hundreds of
Texas Tech University, Alexander W. Clark, August 2015
56
iterations of the program and set to favor smaller clusters of correctly identified pigs
rather than large clusters with misidentified pigs. Due to this bias, it is generally
unlikely that a data point will be misidentified and thus discarded as an outlier.
Nevertheless, it is still good practice to have the assurance of reliability built in for
unforeseen anomalies.
Cluster Regrouping
Algorithm for Regrouping Fractured Clusters
Smaller groups of more accurately identified pigs are favored over large
clusters with misidentified pigs. It is thus possible that during the process of the
clusters being formed, some of the clusters fractured and split pictures of the same pig
into separate groups. An example of this can be seen in Figure 6.6 below.
Figure 6.6: Example of Cluster Fracturing With Unintentionally Split Clusters
Highlighted
It can be seen above by the spray-painted markings on the pigs that there are
seven pigs output, when there are only five in the pen. Two of the pigs have been
reported twice. Here we come to a trade-off. If the standards are lower for the facial
recognition, then there will be many misidentified pig faces. If the standards are too
Texas Tech University, Alexander W. Clark, August 2015
57
strict, then the clusters can fracture into smaller groups of the same pigs. The
compromise is to do both, but in separate steps.
The full flow chart is depicted below in Figure 6.7 and will follow a process
that is described through the rest of this section.
Figure 6.7: Cluster Regrouping Flowchart
To begin, we operate the unsupervised clustering based on local binary
patterns with the strictest of parameters. This yields many different clusters but with
very low probability of any cluster containing a misidentified pig. As discussed in the
section on unsupervised clustering, any time the program comes across a face that is
not previously trained or recognized, the program creates a new label entirely. At this
point, we may have as many as three to four times the number of clusters as we
actually have pigs.
The very next step is the same process as discussed on thinning the data of
outliers. The smallest clusters, of four or less images, are deleted and discarded as
Texas Tech University, Alexander W. Clark, August 2015
58
insufficient data. This leaves a small number of clusters but still most likely more
clusters than there are pigs.
Now, given the supervised facial recognition enters with 𝑁 clusters, then
supervised facial recognition iterates 𝑁 times. For every cluster that exists, the label
and associated sample image is removed from the training set. The images that were
originally labeled as that cluster are retested. However, this time, a label other than the
original one will be forced on the image by the next closest fit. If 75% or more of the
images in the cluster and can be relabeled as the same label of another cluster, it is
considered safe to assume that these two different clusters were actually the same pig.
All of the images in the cluster being retested are labeled as the new cluster, while the
previously existing cluster label and sample image are deleted. As this process repeats
for every cluster that exists, the number of total clusters decreases, but we keep the
same number of total pictures. Different pictures are just regrouped together into
larger clusters of the same face.
To show some of this technique in action, take a look at the console log below
in Figure 6.8.
Figure 6.8: Console Log of Clusters Being Combined Using the Regrouping Method
of Supervised Facial Recognition
Texas Tech University, Alexander W. Clark, August 2015
59
The first label applied to Cluster 1 was, of course, that it belonged to Cluster 1.
When the Cluster 1 label and sample image were omitted from the testing set though,
notice that 90% of the data set could be relabeled as Cluster 9 (as designated by
“…2nd: 9…”). Since the mode of the cluster being retested is 9 by 90% of the data,
Cluster 1 meets the above 75% minimum and can be combined with Cluster 9 as it is
now considered safe to assume these are the same face.
In order to show both sides, take a look at a case where the cluster is not
moved, such as shown below in the console log of Figure 6.9.
Figure 6.9: Console Log of Cluster Being Rejected for Combination Using the
Regrouping Method of Supervised Facial Recognition
In this case, Cluster 4 was not combined into any new cluster but was instead
kept as a separate entity. The reason for this is that while the mode of the newly
labeled pictures was for Cluster 13, this only applied to 58.333% of the images, not
meeting the 75% requirement. Because of this, Cluster 4 is kept as Cluster 4 was
originally.
Texas Tech University, Alexander W. Clark, August 2015
60
The last thing worth mentioning on the regrouping is a final countermeasure
built-in to prevent unintended data grouping. In addition to the new labels having to be
over 75% in agreement, the weights of the two clusters being combined must also be
within 10% of each other to be a valid combination. While there were no examples of
this being vital in the test runs, it was implemented anyway just to ensure another level
of protection against unintentional combining.
Final Cluster Regrouping Results
After this algorithm was implemented following the facial recognition, the
output of the program with the testing set looks like the window below in Figure 6.10.
Figure 6.10: Cluster Regrouping Final Results
It is relatively easy to see that the program differentiated between five different
pigs (proven by the spray paint markings on their head). The error for the weight
calculation is remarkably low. The largest difference in the calculated weight of a pig
and the recorded weight is on the fourth pig (bottom-left). With a calculated weight of
207 pounds and a recorded weight of 212 pounds, the percentage error is just 2.358%,
as indicated by Equation 6.6.
Texas Tech University, Alexander W. Clark, August 2015
61
𝜌4 = |212−207
212| ∙ 100 = 2.358% (6.6)
A screenshot of the resulting console output can be found in Appendix F.
Cluster Adjustments Conclusion
No system is perfect, but the measures in this chapter have been implemented
to ensure that the system can at least handle some degree of expected error. The
program can now account for errant data values through Grubbs’ T-test, irregular pig
orientation through angle limitation, misclassifications by discarding small clusters,
and clusters of the same pig fracturing. All of this error protection is vital in delivering
consistent, valid estimations of the pigs’ weights.
Texas Tech University, Alexander W. Clark, August 2015
62
CHAPTER VII
CONCLUSION
Accomplishments
The purpose of this project was to construct software that can analyze the faces
of pigs in digital images taken from a pig pen, ultimately to predict the weight. The
work done on this project met all design goals.
The image processing classification is robust enough to detect in a matter of
milliseconds if a picture contains a pig’s face, and if all features on the pig can be
detected, such as the nose and both eyes. Pig movement does not affect the outcome of
the program’s calculation. If only one eye is visible or if one is closed, the picture is
rejected, assuming that a valid one can be taken eventually.
The program represents the first time unsupervised facial recognition
technology has been implemented on pigs. While other facial recognition research has
been done on the animal, this is the first program to operate on untrained data.
The program even implements multiple algorithms to handle its own errors.
Statistical outliers are discarded via Grubb’s T-test, and misclassified pictures are
quickly weeded out of the valid images. The program is even designed to double
check its facial recognition, combining clusters that might have fractured during the
process and had falsely marked one pig as two clusters.
Future Work
The project as it stands now is designed to run at the end of a 24-hour day.
While the weight of the pigs that drank that day is accurately recorded, there will still
need to be further software development to track the clusters and weights of the pigs
over time for the information to be useful.
In addition, if this prototype was to be commercialized, it is highly advised that
the data for the regression be retrained, this time being very careful to measure both
Texas Tech University, Alexander W. Clark, August 2015
63
the angle of the camera and its x-y distance from the water spigot end. The desired
metrics are shown below in Figure 7.1.
Figure 7.1 Necessary Standard Metrics for Camera Set-Up
Originally, the need for multi-angle compensation was thought to be needed in
this project. Early project designs were prepared to account for the camera being at
different angles and distances from the spigot. However, due to the lack of granularity
in the pixel data and the features ultimately used in the weight formula, this is realized
to no longer be necessary.
For instance, the only metrics being used in calculating the weight of the pigs
are the average eye size, the difference in eye size, the 2D Euclidean distance between
the eyes, and the coordinate position of the nose. Also, the calculated weight of the pig
before averaging the cluster can be off by as much as 10 pounds. A single picture of a
pig and the calculation performed on that picture will hardly ever get the pig’s weight
exactly. It will merely get close. In fact, it is the averaged weight across a cluster of
images of the pigs at multiple angles and positions that can get the more accurate
weight.
Texas Tech University, Alexander W. Clark, August 2015
64
In the end, because there will have to be many pictures taken and each
individual image only places the weight in a loose figure, the problem of
compensating for multiple camera angles and positions is not as important as
originally supposed.
The one item that is important is keeping the camera roughly in the same
position as the data it was trained with. In order for the nose coordinate position to
have any meaning, the position and angle of the camera need to be closely similar in
every set-up of every pen it is in. While this may seem like a strict and hard to achieve
design goal, the set-up can be optimized simply with a bubble level and some string.
The camera box, if commercialized, would be supplied with a triangle of three strings
and three washers, with the sides corresponding to the distance of the camera to the
water spigot (Figure 7.2). The camera, set at its trained angle in the box, could then be
outfitting with a water level, to guarantee it is roughly set at the same angle every
time.
Figure 7.2: Devices Used in Setting Up the Standard Metrics of the Camera System
Texas Tech University, Alexander W. Clark, August 2015
65
Closing Remarks
As with any engineering product, the system will definitely need to go through
the usual process of refining, testing, and debugging. The software should be
expanded to include cluster tracking so that the pigs can be monitored over time. A lot
more data on the pigs will need to be collected to provide long term and greater
variety of data.
Nevertheless, the project as a whole serves as a valid proof of concept in that a
system can indeed be successfully made to calculate the weight of a pig from a mere
two-dimensional image.
Texas Tech University, Alexander W. Clark, August 2015
66
BIBLIOGRAPHY
“Geometric Transformations." OpenCV 2.4.11.0 Documentation. OpenCV Dev Team,
25 Feb. 2015. Web. 29 May 2015.
<http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html
>.
"Histogram Comparison." OpenCV 2.4.11.0 Documentation. OpenCV Dev Team, 25
Feb. 2015. Web. 29 May 2015.
<http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_comparis
on/histogram_comparison.html>.
"Histogram Equalization." OpenCV 2.4.11.0 Documentation. OpenCV Dev Team, 25
Feb. 2015. Web. 29 May 2015.
<http://docs.opencv.org/doc/tutorials/imgproc/histograms/histogram_equalizati
on/histogram_equalization.html>.
Lancaster, Don. "A Review of Some Pixel Image Interpolation Algorithms." Image
Super-Resolution and Applications (2012): n. pag. Tinaja. GuruGram, 2007.
Web. <http://www.tinaja.com/glib/pixintpl.pdf>.
"Outlier Handout." Statistical Treatment of Analytical Data (2004): n. pag. Web.
<http://education.mrsec.wisc.edu/research/topic_guides/outlier_handout.pdf>.
Viola and Jones, "Rapid object detection using a boosted cascade of simple features",
Computer Vision and Pattern Recognition, 2001.
Y. Freund, R.E. Shapire, “A decision-theoretic generalization of online learning and
an application to boosting,” J. Comput. Syst. Sci. 55 (1) (1997) 119–139.
Wagner, Phillip. "Local Binary Patterns." Httpbytefishde ATOM. N.p., 8 Nov. 2011.
Web. 30 Apr. 2015. <http://www.bytefish.de/blog/local_binary_patterns/>.
Watkins, Christopher, Alberto Sadun, and Stephen Marenka. Modern Image
Processing: Warping, Morphing, and Classical Techniques. Boston: Academic,
1993. Print.
Texas Tech University, Alexander W. Clark, August 2015
67
APPENDIX A
CONSOLE LOG DURING FEATURE DETECTION
Texas Tech University, Alexander W. Clark, August 2015
68
APPENDIX B
EXAMPLES OF PIGS CLASSIFIED USING PROGRAM
Texas Tech University, Alexander W. Clark, August 2015
69
Texas Tech University, Alexander W. Clark, August 2015
70
Texas Tech University, Alexander W. Clark, August 2015
71
Texas Tech University, Alexander W. Clark, August 2015
72
APPENDIX C
CONSOLE LOG DURING TRAINING MODE
Texas Tech University, Alexander W. Clark, August 2015
73
APPENDIX D
CONSOLE LOG AT END OF TRAINING MODE
Texas Tech University, Alexander W. Clark, August 2015
74
APPENDIX E
CONSOLE LOG DURING FACE RECOGNIZER MODE
Texas Tech University, Alexander W. Clark, August 2015
75
APPENDIX F
CONSOLE LOG AFTER FACE RECOGNIZER MODE
Texas Tech University, Alexander W. Clark, August 2015
76
Texas Tech University, Alexander W. Clark, August 2015
77
APPENDIX G
HOW-TO GUIDE ON RUNNING PIG ESTIMATION PROGRAMS
Texas Tech University, Alexander W. Clark, August 2015
78
INTRODUCTION
The intent of this document is to loosely guide the user on the four different modes of
the program I made, as well as how to run the MATLAB script for training the data. It
does not include information on the source code itself, as the source is well-
documented and extensively commented to hopefully answer any questions one might
have.
Note that all of the programs require the provided DLL files from OpenCV, the three
XML classifier files, a folder of images, CSV files designating those images (covered
later), and the CSV file of the regression coefficients (generation explained later).
Texas Tech University, Alexander W. Clark, August 2015
79
TRAINING MODE
The Training Mode is used to generate the feature vectors that train the regression
data. It can be run by either running (double-clicking) training_mode.exe from within
the folder of files or setting the value of PROGRAM_MODE in Visual Studio to 1:
In order for this program to run properly, make sure that the image files to be trained
are specified in a comma-separated values (CSV) file named supervised_pigs.csv.
The first value of each row is the filename and path (needs to be in same folder as
executable), the second is the known ID of the pig (integer number), and the known
weight of the pig (also an integer). An example of this file is shown in Notepad:
and in Excel.
As each valid pig shows up, you will be given the option to press “N” on the keyboard
to mark misclassified pigs:
or hit any other key to mark correctly classified pigs:
Texas Tech University, Alexander W. Clark, August 2015
80
After every pig has been sorted through, a CSV file of the feature vectors for only the
correctly classified pigs is created. It is titled feat_vect.csv and is shown below:
This data will be used to train the regression coefficients.
Texas Tech University, Alexander W. Clark, August 2015
81
GENERATING COEFFICIENTS
In order to predict the weight of a pig in a pen, data from pigs in that pen must be
trained via the regression code provided in regression.m.
Before running the script in MATLAB, make sure that the file feat_vect.csv is in the
same folder as the script. If it is, and you have a correctly generated feature vector file,
then you can run the MATLAB script.
Running this script will not only output the associated graphs generated in MATLAB:
but most importantly the coefficients file coefficients.csv:
0 20 40 60 80 100 120 140150
160
170
180
190
200
210
220
230
240
250
Sample Number
Weig
ht
Training Sample Regression Results
Actual Weight
Calculated Weight
1 2 3 4 5 6160
170
180
190
200
210
220
230
240
Sample Number
Weig
ht
Training Set
Average Calculated Weight
Actual Weight
Texas Tech University, Alexander W. Clark, August 2015
82
This file can be placed in the same folder as cluster_mode_supervised.exe,
cluster_mode_unsupervised.exe, sample_mode.exe, and training_mode.exe to
modify the regression coefficients the programs run on. Better data can be trained this
way.
Texas Tech University, Alexander W. Clark, August 2015
83
CLUSTER MODE (SUPERVISED)
The Cluster Mode (Supervised) is used to cluster and predict the weights of multiple
pigs in a pen and output this estimate alongside known data on the pigs. It can be run
by either running (double-clicking) cluster_mode_supervised.exe from within the
folder of files or setting the value of PROGRAM_MODE in Visual Studio to 2:
In order for this program to run properly, make sure that the image files to be trained
are specified in a comma-separated values (CSV) file named supervised_pigs.csv.
The first value of each row is the filename and path (needs to be in same folder as
executable), the second is the known ID of the pig (integer number), and the known
weight of the pig (also an integer). An example is shown in Notepad:
and in Excel.
The results of the run will be output onto a dynamic screen showing each pig and its
weight.
Texas Tech University, Alexander W. Clark, August 2015
84
CLUSTER MODE (UNSUPERVISED)
The Cluster Mode (Unsupervised) is used to cluster and predict the weights of
multiple pigs in a pen without known data on the pigs. It can be run by either running
(double-clicking) cluster_mode_unsupervised.exe from within the folder of files or
setting the value of PROGRAM_MODE in Visual Studio to 3:
In order for this program to run properly, make sure the images being tested are
specified in a comma-separated values (CSV) file named unsupervised_pigs.csv.
Every line is just the filename and path of the image (needs to be in same folder as
executable). An example of this file is shown in Notepad:
and in Excel.
The results of the run will be output onto a dynamic screen showing each pig and its
weight.
Texas Tech University, Alexander W. Clark, August 2015
85
SAMPLE MODE (PREDICTING FROM A SINGLE IMAGE)
This mode serves as an example for operation of the program when you want to see
the weight of just a single pig. This mode only works given that the Clustered Mode
has already been run on the batch of pigs, preferably for a full day in order to cluster a
batch of pigs’ faces properly.
If the program has already been run in Cluster Mode, then there should be two
generated files: cluster_data.csv and face_rec_model.xml. Make sure both of these
are available in the folder.
Then, provide a text file called single_pig.txt. The only line that should be listed in
this code is the name and path of the image being tested, such shown in Notepad:
After that, the program can be executed by either running (double-clicking)
sample_mode.exe from within the folder of files or setting the value of
PROGRAM_MODE in Visual Studio to 0:
The output of the program will show the picture of the pig and designate its estimated
weight.
Texas Tech University, Alexander W. Clark, August 2015
86
APPENDIX H
MATLAB CODE FOR LEAST SQUARES REGRESSION
close all; clear all; clc;
%% -- Import Data and Create Predictor Vector -----------------------
------ % Read the data from the .csv file created by the Face Training
program data = csvread('feat_vect.csv'); % All columns except the last are features, the last is its weight X = data(:,1:5); Y = data(:,6);
% Expands the predictors by multiplying the original predictors times % themselves twice m=size(X,1); X2 = reshape(bsxfun(@times,reshape(X,m,1,[]),X),m,[]); X = [X,X2,reshape(bsxfun(@times,reshape(X,m,1,[]),X2),m,[])]; [N,L] = size(X); X = [ones(N,1),X]; L = L + 1;
%% -- Pre-Loop Definitions ------------------------------------------
------ % Defines how many iterations to run varying random sets of training
and % testing data k_max = 3000; % Define a vector that will store the testing error for each random
seed E_test = zeros(1,k_max); % Define the weight vector W = zeros(L,k_max);
% loop through values of k for k = 1:k_max %% -- Creation of Random Training/Testing Sets ------------------
------ % Declare the sizes of the training and testing sets p_test = 0.3; n_test = round(N*p_test); n_train = N-n_test;
% Create the vectors for the training and testing sets based on
size X_train = X; X_test = zeros(n_test,L); Y_train = Y; Y_test = zeros(n_test,1);
Texas Tech University, Alexander W. Clark, August 2015
87
% Set the random seed based on the value of k rng(k-1);
% Divy up the training/samples sets for i = 1:n_test j = round(rand()*(N-i))+1; X_test(i,:) = X_train(j,:); X_train = [X_train(1:j-1,:); X_train(j+1:N-i+1,:)]; Y_test(i) = Y_train(j); Y_train = [Y_train(1:j-1); Y_train(j+1:N-i+1)]; end
%% -- Solving for Weight Vector ---------------------------------
------ % Solve for the weight vector based on pseudo-inverse and
training set W(:,k) = (pinv(X_train)*Y_train);
% Report the cooresponding error value of the weight vector
created for i = 1:n_test E_test(k) = E_test(k) + (Y_test(i)-
dot(W(1:L,k),X_test(i,:)))^2; end E_test(k) = E_test(k)/n_test; end
%% -- Evaluation of Random Samples ----------------------------------
------ % Find the index of the set with the lowest error ind_j = find(E_test == min(E_test(:))); % Select the weight vector of that set W_final = W(:,ind_j);
% Recreate the training and testing sets based on the random seed
used rng(ind_j-1); X_train = X; X_test = zeros(n_test,L); Y_train = Y; Y_test = zeros(n_test,1); for i = 1:n_test j = round(rand()*(N-i))+1; X_test(i,:) = X_train(j,:); X_train = [X_train(1:j-1,:); X_train(j+1:N-i+1,:)]; Y_test(i) = Y_train(j); Y_train = [Y_train(1:j-1); Y_train(j+1:N-i+1)]; end
%% -- Find Training Sample Error ------------------------------------
------
Texas Tech University, Alexander W. Clark, August 2015
88
Y_hat_trn = zeros(n_train,1); e_train = zeros(n_train,1);
for i = 1:n_train Y_hat_trn(i) = dot(W_final,X_train(i,:)); e_train(i) = (Y_train(i)-Y_hat_trn(i))^2; end
e_train = sum(e_train)/n_train;
%% -- Find the testing sample error ---------------------------------
------ Y_hat_tst = zeros(n_test,1); e_test = zeros(n_test,1);
for i = 1:n_test Y_hat_tst(i) = dot(W_final,X_test(i,:)); e_test(i) = (Y_test(i)-Y_hat_tst(i))^2; end
e_test = sum(e_test)/n_test;
%% -- Sort the Data in Ascending order ------------------------------
------ flag = 1;
while flag flag = 0; for i = 1:n_test-1 if Y_test(i+1) < Y_test(i) temp = Y_test(i+1); Y_test(i+1) = Y_test(i); Y_test(i) = temp; temp = Y_hat_tst(i+1); Y_hat_tst(i+1) = Y_hat_tst(i); Y_hat_tst(i) = temp; flag = 1; end end end
flag = 1; temp = 0;
while flag flag = 0; for i = 1:n_train-1 if Y_train(i+1) < Y_train(i) temp = Y_train(i+1); Y_train(i+1) = Y_train(i); Y_train(i) = temp; temp = Y_hat_trn(i+1);
Texas Tech University, Alexander W. Clark, August 2015
89
Y_hat_trn(i+1) = Y_hat_trn(i); Y_hat_trn(i) = temp; flag = 1; end end end
%% -- Plot Results of All Samples -----------------------------------
----- % Plot the training sample of predicted value versus actual figure plot(1:n_train,Y_train,1:n_train,Y_hat_trn) legend('Actual Weight','Calculated Weight','Location','southeast') title('Training Sample Regression Results') xlabel('Sample Number') ylabel('Weight') grid on
% Plot the testing sample of predicted value versus actual figure plot(1:n_test,Y_test,1:n_test,Y_hat_tst) legend('Actual Weight','Calculated Weight','Location','southeast') title('Test Sample Regression Results') xlabel('Sample Number') ylabel('Weight') grid on
%% -- Average Calculated Weight for Given Weight --------------------
------ % Calculate the average values of calculated weight for given weight
on training i = 1; avg_trn = []; while i < n_train curr_w = Y_train(i); ind = find(Y_train == curr_w); avg_trn = [avg_trn;mean(Y_hat_trn(ind)),curr_w]; i = max(ind)+1; end
% Calculate the average values of calculated weight for given weight
on testing i = 1; avg_tst = []; while i < n_test curr_w = Y_test(i); ind = find(Y_test == curr_w); avg_tst = [avg_tst;mean(Y_hat_tst(ind)),curr_w]; i = max(ind)+1; end
% Plot the results for the training set figure
Texas Tech University, Alexander W. Clark, August 2015
90
plot(1:length(avg_trn),avg_trn(:,1),'b*',1:length(avg_trn),avg_trn(:,
2),'r*') legend('Average Calculated Weight','Actual
Weight','Location','southeast') title('Training Set') xlabel('Sample Number') ylabel('Weight') set(gca,'XTick', 1:1:length(avg_trn)); grid on
% Plot the results for the testing set figure plot(1:length(avg_tst),avg_tst(:,1),'b*',1:length(avg_tst),avg_tst(:,
2),'r*') legend('Average Calculated Weight','Actual
Weight','Location','southeast') title('Testing Set') xlabel('Sample Number') ylabel('Weight') set(gca,'XTick', 1:1:length(avg_trn)); grid on
%% -- Write the Coefficients CSV File -------------------------------
------ csvwrite('coefficients.csv',W_final');
Texas Tech University, Alexander W. Clark, August 2015
91
APPENDIX I
OPENCV C++ SOURCE CODE
PigWeight.cpp
#include <opencv2/core/core.hpp> #include <opencv2/objdetect/objdetect.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <iostream> #include <stdio.h> #include <stdlib.h> #include <string> #include <fstream> #include "PigClassifier.h" #include "PigFace.h" using namespace std; using namespace cv; /* PROGRAM MODES: 0 -> Sample Mode (predicts a weight for a single picture) UNIMPLEMENTED 1 -> Training Mode (outputs feature vectors) 2 -> Cluster Mode - Supervised (clusters faces into groups with predicted weights and shows known data alongside it) 3 -> Cluster Mode - Unsupervised (clusters faces into groups with predicted weights)*/ #define PROGRAM_MODE 2 /* RECOGNIZER MODES: 0 -> Execution Mode (the values I've found to yield the smallest error) 1 -> Training Mode (outputs mega CSV file for results of all different values 2 -> Prompt Mode (asks for values manually) */ #define RECOGNIZER_MODE 0 // Global Variables for Window Names char* image_window = "Source Image"; char* trans_window = "Transformed Image"; char* final_faces = "Final detected faces"; /** @function main */ int main( int argc, char** argv ) { // Set the random seed srand(0); // Declare input file and associated vectors ifstream in_file; vector<string> files; vector<int> weights;
Texas Tech University, Alexander W. Clark, August 2015
92
vector<int> piggies; // Open the appropriate file depending on the program mode string line; if(PROGRAM_MODE == 1 || PROGRAM_MODE == 2) { in_file.open("supervised_pigs.csv"); } else if (PROGRAM_MODE == 3) { in_file.open("unsupervised_pigs.csv"); } else { in_file.open("single_pig.txt"); } if(in_file.is_open()) { if(PROGRAM_MODE == 1 || PROGRAM_MODE == 2) { while(in_file.good() ) { // Get the file name getline(in_file, line, ','); if(line == "") { break; } files.push_back(line); // Get the next item getline(in_file, line, ','); if(line == "") { files.pop_back(); break; } // Store the ID and weight if applicable if(PROGRAM_MODE == 1 || PROGRAM_MODE == 2) { piggies.push_back(stoi(line)); getline(in_file, line); weights.push_back(stoi(line)); } } } else { while(in_file.good() ) { // Get the file name getline(in_file, line); if(line == "") { break; } files.push_back(line); piggies.push_back(-1); weights.push_back(-1); } } } else { cout << "Failed to open input file." << endl; cout << "Press any key to terminate program." << endl; waitKey(0); return 0; } // Allow window to be resized namedWindow( image_window, CV_WINDOW_NORMAL ); namedWindow( trans_window, CV_WINDOW_AUTOSIZE );
Texas Tech University, Alexander W. Clark, August 2015
93
// Create file to write feature vectors to ofstream feat_vect_file; feat_vect_file.open ("feat_vect.csv"); // Instantiate pig class PigClassifier pig; vector<PigFace> pig_faces; // Create vector for the file names of misclassified images vector<string> mis_class; // Iterate through all of the image files listed for(size_t j = 0; j < files.size(); j++) { // Load images Mat img = imread(files[j]); if(PROGRAM_MODE == 1 || PROGRAM_MODE == 2) { cout << j << ":" << files[j] << " ("; cout << "Piggie #" << piggies[j]; cout << " weighs " << weights[j] << " lbs"; cout << ") : " << endl; } else { cout << "Piggie # " << j << ":" << endl; } // Set the known weight for supervised data if(weights.size() > j) { pig.weight_known = weights[j]; } // Set the known piggie ID for supervised data if(piggies.size() > j) { pig.piggie_known = piggies[j]; } // Check for invalid input if(!img.data) { cout << "Could not open or find the image" << endl << endl; continue; } // Classify the facial geometry of the pig bool valid_pig = pig.classify(img); // Border color Scalar value; // Skips the invalid pigs and checks valid ones with user if(valid_pig) { // Green border for accepted images value = Scalar(0,200,0); // Calculate the facial geometries pig.calcMetrics(); // Mark the image with all geometries (minus transformation) pig.markImage(); // Calculate the transformation of a valid face pig.calcTransformation(); // Calculate weight of pig pig.calcWeight();
Texas Tech University, Alexander W. Clark, August 2015
94
// Display picture imshow(image_window, pig.getImg()); // Display the cropped transformation imshow(trans_window, pig.getCroppedFace()); // Prompt user to mark misclassified if training facial geometry if(PROGRAM_MODE == 1) { char key; cout << "Validate with any button, reject with 'n'" << endl; key = waitKey(0)%255; // 0 = Wait indefinitely for keypress, k*1000 = Wait k seconds or for keypress // If 'n' is pressed, classification was not valid if(key == 27) { // Escape key pressed, end training program break; } else if(key == 'n') { cout << "Classification marked as INVALID." << endl << endl; mis_class.push_back(files[j]+" ("+to_string((long long)j)+")"); pig.misclassified++; // Red border for rejected images value = Scalar(0,0,200); continue; } else if(key == ' ') { cout << "Classification marked as VALID." << endl; } else { cout << "Classification assumed to be VALID." << endl; } // Write feature vector to file to input to regression script vector<double> feat_vect = pig.getFeatVect(); for(size_t i = 0; i < feat_vect.size(); i++) { feat_vect_file << feat_vect[i] << ","; } feat_vect_file << endl; } // Push valid faces onto vector if(PROGRAM_MODE != 1) { // Only run facial recognition on pictures of proper orientation (or short-circuit if in sample mode) if(abs(pig.eye_angle + pig.nose_angle) < 0.04 || PROGRAM_MODE == 0) { pig_faces.push_back(PigFace(pig.getCroppedFace(), pig.piggie_known, files[j], pig.weight_known, pig.weight_est)); } } } else { // Red border for rejected images value = Scalar(0,0,200); }
Texas Tech University, Alexander W. Clark, August 2015
95
// Output image with border on it float BORD = 0.03; img = pig.getImg(); img = img(Rect(img.cols*BORD/2, img.rows*BORD/2,img.cols*(1-BORD),img.rows*(1-BORD))); copyMakeBorder( img, img, img.rows*BORD, img.rows*BORD, img.cols*BORD, img.cols*BORD, BORDER_CONSTANT, value ); imshow(image_window, img); waitKey(1); cout << endl; } // Close output file feat_vect_file.close(); // If in training mode, print out the number of misclassified images manually marked if(PROGRAM_MODE == 1) { if(!mis_class.empty()) { cout << endl; cout << "*************************************************************" << endl <<endl; cout << "Misclassified images include: " << endl; for(size_t i = 0; i < mis_class.size(); i++) { cout << " - " << mis_class[i] << endl; } cout << endl; } } // Print out all of the data from the classification pig.getResults(); if(PROGRAM_MODE == 2 || PROGRAM_MODE == 3) { // Destroy the windows for the last pig image and the transformed image destroyWindow(image_window); destroyWindow(trans_window); } // delete any previous Recognizer log files ofstream temp_file; temp_file.open ("recognizer_output.txt"); temp_file.close(); // If in Cluster Mode, cluster all of the valid faces if(PROGRAM_MODE != 1) { cout << "RECOGNIZER PROGRAM_MODE!!" << endl; cout << " Let's look at " << pig_faces.size() << " faces." << endl; // If we aren't in Execution Mode, no need to output title for parameters if(RECOGNIZER_MODE != 0) { feat_vect_file.open ("recognizer_training.csv",ios::app);
Texas Tech University, Alexander W. Clark, August 2015
96
feat_vect_file << "threshold, min_cluster_count, radius, neighbors, grid_x, grid_y, " << endl; feat_vect_file.close(); } // Declare all the different parameters used in the facial LBP facial recognition int threshold, min_cluster_count, radius, neighbors, grid_x, grid_y, iterations; bool training = false; if(PROGRAM_MODE != 0) { // Use the facial recognition to create clusters if(RECOGNIZER_MODE == 0) { // Runtime mode threshold = 80; min_cluster_count = 5; radius = 4; neighbors = 12; grid_x = 4; grid_y = 8; iterations = 1; pig.createClusters(pig_faces, threshold, min_cluster_count, radius, neighbors, grid_x, grid_y, iterations,false); } else if (RECOGNIZER_MODE == 1) { // Recognizer training mode (runs the recognition multiple times in all kinds of configurations to give a good idea where to start with the values) vector<int> training_data; min_cluster_count = 0; iterations = 1; for(threshold = 20; threshold <= 60; threshold+=10) { for(radius = 1; radius <= 4; radius++) { for(neighbors = 8; neighbors <= 12; neighbors+=2) { for(grid_x = 4; grid_x <= 12; grid_x+=4) { for(grid_y = 4; grid_y <= 16; grid_y+=4) { training_data = pig.createClusters(pig_faces, threshold, min_cluster_count, radius, neighbors, grid_x, grid_y, iterations, true); feat_vect_file.open ("recognizer_training.csv",ios::app); cout << "File output: "; for(size_t i = 0; i < training_data.size(); i++) { feat_vect_file << training_data[i] << ","; cout << training_data[i] << ","; }
Texas Tech University, Alexander W. Clark, August 2015
97
feat_vect_file << endl; cout << endl; feat_vect_file.close(); } } } } } } else { // Prompt mode (ask for the parameters of the values to manually be typed in). Note that some pictures will be deleted every run do { cout << "Select a threshold for recognition confidence (default = 80): " << endl; cin >> threshold; cout << "Select a minimum cluster count (default = 5): " << endl; cin >> min_cluster_count; cout << "Select a radius (default = 4): " << endl; cin >> radius; cout << "Select how many neighbors (default = 12): " << endl; cin >> neighbors; cout << "Select grid_x (default = 4): " << endl; cin >> grid_x; cout << "Select grid_y (default = 8): " << endl; cin >> grid_y; cout << "Select the number of iterations (default = 1): " << endl; cin >> iterations; pig.createClusters(pig_faces, threshold, min_cluster_count, radius, neighbors, grid_x, grid_y, iterations, false); } while(threshold != 0 && RECOGNIZER_MODE != 0); } } else { // Run recognizer on just a single image pig.samplePig(pig_faces[0]); } } // Wait for user to view results waitKey(0); return 0; }
Texas Tech University, Alexander W. Clark, August 2015
98
PigClassifier.h
#ifndef PIGCLASSIFIER_DEF #define PIGCLASSIFIER_DEF #include <opencv2/core/core.hpp> #include <opencv2/objdetect/objdetect.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <opencv2/contrib/contrib.hpp> #include <iostream> #include <stdio.h> #include <string> #include <math.h> #include <fstream> #include <algorithm> #include "utilities.h" #include "PigFace.h" using namespace std; using namespace cv; class PigClassifier { public: static int images_classified; static int misclassified; static int failed_from_face; static int failed_from_eyes; static int failed_from_nose; // Known weight for supervised data double weight_known; double weight_est; int piggie_known; double eye_angle; double nose_angle; // No-arg constructor has all empty variables PigClassifier(); // Weight function, calculates weight of pig given the other feature vectors double calcWeight(); // Classification function, classifies a face and features for an image bool classify(Mat img); // Return the Mat stored with the pig Mat getImg(); // Choose one of the faces as valid Rect chooseFace(vector<Rect> faces); // Choose two eyes as valid, false if no valid eyes bool chooseEyes(vector<Rect> eyes); // Choose one of the noses as valid
Texas Tech University, Alexander W. Clark, August 2015
99
Rect chooseNose(vector<Rect> noses); // Calculate the distance between the eyes, angle, distance to nose, and angle void calcMetrics(); // Mark image with facial geometries void markImage(); // Calculate the perspective transformation of the face void calcTransformation(); // Return the appropriate feature vector for the facial geometry vector<double> getFeatVect(); // Return static variables summary void getResults(); // Return cropped face Mat getCroppedFace(); // Prints the cluster data on a vector of pig faces void getClusters(vector<PigFace> & pig_faces); // Creates clusters out of the pig faces using LBP facial recognition vector<int> createClusters(vector<PigFace> & pig_faces, int confidence_threshold, int min_cluster_count, int radius, int neighbors, int grid_x, int grid_y, int iterations, bool training); // Prints one window with all of the final pig images void displayClusters(vector<String> & file_names, vector<int> weight_known, vector<int> weight_est); // Take a single pig face and returns it's weight based on the facial recognition model void samplePig(PigFace piggie); private: // Timer variables double t1, t2; Rect face; Rect lt_eye; Rect rt_eye; Rect nose; Point bisector; double eye_avg; double eye_del; double eye_dist; double nose_dist; Mat img; Mat mark_img; Mat face_img; }; #endif //if not defined
Texas Tech University, Alexander W. Clark, August 2015
100
PigClassifier.cpp
#include "PigClassifier.h" #define SHOW_MISCLASS 0 #define PRINT_TIMES 0 // declare and instatiate the global variables int PigClassifier::images_classified = 0; int PigClassifier::misclassified = 0; int PigClassifier::failed_from_face = 0; int PigClassifier::failed_from_eyes = 0; int PigClassifier::failed_from_nose = 0; // Define the constructor PigClassifier::PigClassifier() { this->rt_eye = Rect(0,0,0,0); this->lt_eye = Rect(0,0,0,0); this->face = Rect(0,0,0,0); this->weight_known = -1; this->weight_est = 0; } // Weight function, calculates weight of pig given the other feature vectors double PigClassifier::calcWeight() { ifstream coefficients_file("coefficients.csv"); string line; vector<float> coefficient; vector<float> predictor; // Get all of the coefficients from file if(coefficients_file.is_open()) { while(coefficients_file.good()) { getline(coefficients_file, line, ','); coefficient.push_back(stof(line)); } } else { cout << "Failed to open coefficients file." << endl; return 0; } // Create all of the predictors (everything multiplied times everything, plus a 1 for constant double first_set[] = {this->eye_avg, this->eye_del, this->eye_dist, nose.x, nose.y}; double set_size = 5; for(int i = 0; i < set_size; i++) { predictor.push_back(first_set[i]); } for(int i = 0; i < set_size; i++) { for(int j = 0; j < set_size; j++) { predictor.push_back(first_set[i]*first_set[j]); } }
Texas Tech University, Alexander W. Clark, August 2015
101
int mid_set_size = predictor.size(); for(int i = 0; i < set_size; i++) { for( int j = set_size; j < mid_set_size; j++) { predictor.push_back(first_set[i]*predictor[j]); } } predictor.insert(predictor.begin(),1); // Calculate the weight this->weight_est = 0; for(int i = 0; i < predictor.size(); i++) { this->weight_est += coefficient[i]*predictor[i]; } cout << "Estimated weight is " << this->weight_est << endl; return this->weight_est; } // Classification function, classifies a face and features for an image bool PigClassifier::classify(Mat img) { // Increment global number of images classified images_classified++; // Create copies of the image specified in the parameters this->img = img.clone(); this->mark_img = img.clone(); // Define and load all of the cascade classifiers CascadeClassifier eyes_cascade; eyes_cascade.load("cascade_eyes_32stage_999.xml"); CascadeClassifier pigface_cascade; pigface_cascade.load("cascade_face_20stage_999.xml"); CascadeClassifier nose_cascade; nose_cascade.load("cascade_nose_33stage_999.xml"); // Convert to grayscale Mat gray; cvtColor(this->img, gray, CV_BGR2GRAY); equalizeHist( gray, gray); // Detect all possible faces cout << " Classifying faces..." << endl; // Optionally start timers t1 = get_wall_time(); t2 = get_cpu_time(); // Create vector of rectangles and pass to classifier std::vector<Rect> faces; pigface_cascade.detectMultiScale( gray, faces, 1.15, 3, 0|CV_HAAR_SCALE_IMAGE, Size((int)img.rows*0.31,(int)img.rows*0.31)); // Print classification time if desired if(PRINT_TIMES) { cout << "\tWall Clock Time: \t" << get_wall_time()-t1 << " sec " << endl; cout << "\tCPU Clock Time: \t" << get_cpu_time()-t2 << " sec " << endl;
Texas Tech University, Alexander W. Clark, August 2015
102
} // Deal with rectangles depending on how many objects were detected. if(faces.size() == 1) { // If there's only one face, must be the valid one. face = faces[0]; cout << "\tOnly one valid face detected." << endl; } else if(faces.empty()) { // If there are no faces, picture cannot be valid. cout << "\tCould not detect any faces in picture." << endl; failed_from_face++; return false; } else if(faces.size() > 1) { // If there are multiple faces, then the valid one needs to be selected. cout << "\tMore than one face detected, selecting most probable object." << endl; chooseFace(faces); cout << "\tOne single face selected out of " << faces.size() << "." << endl; } // Mark all detected faces if(SHOW_MISCLASS) { for(size_t i = 0; i < faces.size(); i++) { // Mark detected faces onto image rectangle( this->mark_img, Point(faces[i].x,faces[i].y), Point(faces[i].x+faces[i].width,faces[i].y+faces[i].height), Scalar( 0,0,150 ), 5, 8, 0 ); } // Mark detected face onto image early if debugging rectangle( this->mark_img, Point(face.x,face.y), Point(face.x+face.width,face.y+face.height), Scalar(150,0,0), 5, 8, 0 ); } // Resize the face with a buffer for eye detection double buffer = 0.05; face.x = ((int) (face.x - face.width*buffer)) < 0 ? 0 : ((int) (face.x - face.width*buffer)); face.y = ((int) (face.y - face.height*buffer)) < 0 ? 0 : ((int) (face.y - face.height*buffer)); face.width = ((int) (face.width + 2*face.width*buffer)) + face.x > gray.cols ? gray.cols - face.x : ((int) (face.width + 2*face.width*buffer)); face.height = ((int) (face.height + 2*face.height*buffer)) + face.y > gray.rows ? gray.rows - face.y : ((int) (face.height + 2*face.height*buffer)); Mat faceROI = gray( face ); // Detect all possible eyes in ROI cout << " Classifying eyes:" << endl; t1 = get_wall_time(); t2 = get_cpu_time(); // Create vector of rectangles and pass to eye classifier vector<Rect> eyes; eyes_cascade.detectMultiScale( faceROI, eyes, 1.35, 30, 0|CV_HAAR_SCALE_IMAGE, Size((int)img.rows*0.039,(int)img.rows*0.039), Size((int)img.rows*0.154,(int)img.rows*0.154)); // Print classification time if desired
Texas Tech University, Alexander W. Clark, August 2015
103
if(PRINT_TIMES) { cout << "\tWall Clock Time: \t" << get_wall_time()-t1 << " sec " << endl; cout << "\tCPU Clock Time: \t" << get_cpu_time()-t2 << " sec " << endl; } // Check if any eyes are detected if(eyes.empty()) { cout << "\tNo eyes detected on face." << endl; failed_from_eyes++; return false; } // Show misclassified eyes if debugging if(SHOW_MISCLASS) { for( size_t j = 0; j < eyes.size(); j++ ) { rectangle( this->mark_img, Point(face.x + eyes[j].x,face.y + eyes[j].y), Point(face.x + eyes[j].x + eyes[j].width,face.y + eyes[j].y + eyes[j].height), Scalar( 0,0,150 ), 5, 8, 0 ); } } // Choose the eyes that we will actually use if(!chooseEyes(eyes)) { cout << "\tInvalid set of eyes." << endl; failed_from_eyes++; return false; } // Mark valid detected eyes onto image early if debugging if(SHOW_MISCLASS) { rectangle( this->mark_img, Point(face.x + lt_eye.x,face.y + lt_eye.y), Point(face.x + lt_eye.x + lt_eye.width,face.y + lt_eye.y + lt_eye.height), Scalar( 200,0,0 ), 5, 8, 0 ); rectangle( this->mark_img, Point(face.x + rt_eye.x,face.y + rt_eye.y), Point(face.x + rt_eye.x + rt_eye.width,face.y + rt_eye.y + rt_eye.height), Scalar( 200,0,0 ), 5, 8, 0 ); } // Detect all possible noses cout << " Classifying noses..." << endl; // Pad the bottom with a border of 10% to make sure full nose is classified int top=0, bottom=0, left=0, right = 0; bottom = (int) (0.3*gray.rows); int borderType = BORDER_CONSTANT; copyMakeBorder( gray, gray, top, bottom, left, right, borderType, (0,0,0) ); // Start timers t1 = get_wall_time(); t2 = get_cpu_time(); // Define vector of rectangles to pass to nose classifier std::vector<Rect> noses; // Create a nose ROI that's below the face Mat nose_ROI = gray(Rect(1,face.y+face.height-1,gray.cols-1,gray.rows-(face.y+face.height))); // best parameters ...
Texas Tech University, Alexander W. Clark, August 2015
104
nose_cascade.detectMultiScale( nose_ROI, noses, 1.05, 3, 0|CV_HAAR_SCALE_IMAGE, Size((int)img.rows*0.154,(int)img.rows*0.154)); // Print classification times if desired if(PRINT_TIMES) { cout << "\tWall Clock Time: \t" << get_wall_time()-t1 << " sec " << endl; cout << "\tCPU Clock Time: \t" << get_cpu_time()-t2 << " sec " << endl; } //adjust noses for ROI for(size_t i = 0; i < noses.size(); i++) { noses[i].y = noses[i].y + face.y+face.height; } // Deal with nose rectangles depending on how many objects are detected if(noses.size() == 1) { // If only one nose is found, then it must be the valid nose. nose = noses[0]; cout << "\tOnly one valid nose detected." << endl; } else if(noses.empty()) { // If no noses are found, the picture cannot be a valid image of a pig. cout << "\tCould not detect any noses in picture." << endl; failed_from_nose++; return false; } else if(noses.size() > 1) { // If more than one nose are found, then we must select the valid one. cout << "\tMore than one nose detected, selecting most probable object." << endl; chooseNose(noses); cout << "\tOne valid nose selected out of " << noses.size() << "." << endl; } // Mark all detected noses if(SHOW_MISCLASS) { for(size_t i = 0; i < noses.size(); i++) { // Mark detected noses onto image rectangle( this->mark_img, Point((int) noses[i].x,(int) noses[i].y), Point((int)noses[i].x+noses[i].width,noses[i].y+noses[i].height), Scalar( 0,0,150 ), 5, 8, 0 ); } // Mark valid detected nose early if debugging rectangle( this->mark_img, Point((int) nose.x,(int) nose.y), Point((int)nose.x+nose.width,nose.y+nose.height), (0,0,200), 5, 8, 0 ); } return true; } // Choose a valid face based on shortest Euclidean distance to the center of the image Rect PigClassifier::chooseFace(vector<Rect> faces) { double shortest_distance=999999999; Point img_center = ((double) (this->img.cols/2),(double) (this->img.rows/2)); for(size_t i = 0; i < faces.size(); i++) {
Texas Tech University, Alexander W. Clark, August 2015
105
Point face_center = Point(faces[i].x + faces[i].width/2,faces[i].y + faces[i].height/2); double distance = sqrt(pow((double) img_center.x - face_center.x,2.0) + pow( (double) img_center.y - face_center.y,2.0)); if(distance < shortest_distance) { this->face = faces[i]; shortest_distance = distance; } } return face; } // Choose eyes, false if no valid eyes bool PigClassifier::chooseEyes(vector<Rect> eyes) { // Divide eyes into right and left vector<Rect> rt_eyes; vector<Rect> lt_eyes; for(size_t i = 0; i < eyes.size(); i++) { if(eyes[i].x < (int) (face.width/2)) { lt_eyes.push_back(eyes[i]); } else { rt_eyes.push_back(eyes[i]); } } if(lt_eyes.empty()) { cout << "\tFace does not contain any eyes on the left side." << endl; return false; } if(rt_eyes.empty()) { cout << "\tFace does not contain any eyes on the right side." << endl; return false; } cout << "\tThere are " << lt_eyes.size() << " eyes on the left." << endl; cout << "\tThere are " << rt_eyes.size() << " eyes on the right." << endl; //Break eye selector if already have valid pair of eyes if(lt_eyes.size() == 1 && rt_eyes.size() == 1) { this->lt_eye = lt_eyes[0]; this->rt_eye = rt_eyes[0]; return true; } // allocate dynamic 2-dimensional array to predict probability of any two given eyes double** E = new double*[lt_eyes.size()]; for(size_t i = 0; i < lt_eyes.size(); i++) { E[i] = new double[rt_eyes.size()]; } // fill array of expected values all initially with 1 for(size_t i = 0; i < lt_eyes.size(); i++) { for( size_t j = 0; j < rt_eyes.size(); j++) { E[i][j] = 1; }
Texas Tech University, Alexander W. Clark, August 2015
106
} // determine max and min width of eyes int max_width = 0; int min_width = 999999999; for(size_t i = 0; i < lt_eyes.size(); i++) { if(eyes[i].width > max_width) { max_width = eyes[i].width; } if(eyes[i].width < min_width) { min_width = eyes[i].width; } } // create temporary images of eyes, all resized vector<Mat> lt_eye_images; for(size_t i = 0; i < lt_eyes.size(); i++) { lt_eye_images.push_back(this->img(lt_eyes[i])); resize(lt_eye_images[i],lt_eye_images[i],Size(50,50)); } vector<Mat> rt_eye_images; for(size_t i = 0; i < rt_eyes.size(); i++) { rt_eye_images.push_back(this->img(rt_eyes[i])); resize(rt_eye_images[i],rt_eye_images[i],Size(50,50)); flip(rt_eye_images[i],rt_eye_images[i],1); } double max_similarity = 0; double min_similarity = 99999; for(size_t i = 0; i < lt_eyes.size(); i++) { for( size_t j = 0; j < rt_eyes.size(); j++) { // Calculate the L2 relative error between images. double errorL2 = norm( lt_eye_images[i], rt_eye_images[j], CV_L2 ); // Convert to a reasonable scale, since L2 error is summed across all pixels of the image. double similarity = errorL2 / (double)( lt_eye_images[i].rows * lt_eye_images[i].cols ); E[i][j] = similarity; cout << "\tsimilarity result = " << E[i][j] << endl; if(E[i][j] < min_similarity) { min_similarity = E[i][j];\ } if(E[i][j] > max_similarity) { max_similarity = E[i][j]; } } } // Check for potential division by zero if(max_width == min_width) { max_width++; } // compute probablilities cout << "\tExamining probabilities.." << endl; for(size_t i = 0; i < lt_eyes.size(); i++) { for( size_t j = 0; j < rt_eyes.size(); j++) {
Texas Tech University, Alexander W. Clark, August 2015
107
// probability increases with similarity in template matching E[i][j] = (double) 0.5*(E[i][j]-min_similarity)/(max_similarity-min_similarity); // probability increases with similarity in eye size between the two E[i][j] += (double) 0.5*(1 - abs(lt_eyes[i].width-rt_eyes[j].width)/max_width); // probability increases with closeness to horizon E[i][j] += (double) 2*(face.height/2 - abs(face.height/2 - (lt_eyes[i].y+lt_eyes[i].height/2)))/(face.height/2); E[i][j] += (double) 2*(face.height/2 - abs(face.height/2 - (rt_eyes[j].y+rt_eyes[j].height/2)))/(face.height/2); // probability increases the further apart an eye is E[i][j] += (double) abs((lt_eyes[i].x + lt_eyes[i].width/2) - (rt_eyes[j].x + rt_eyes[j].width/2))/(face.width-max_width); // probability increases the less angle there is between the eyes E[i][j] += (double) 4*(face.height - abs(lt_eyes[i].y - rt_eyes[j].y))/(face.height); cout << "\tprobability = " << E[i][j] << " at (" << lt_eyes[i].x << "," << lt_eyes[i].y << ") and (" << rt_eyes[j].x << "," << rt_eyes[j].y << ")" << endl; } } // If a best pairwise probability is found, set that as valid set of eyes int best_i = 0; int best_j = 0; double best_E = 0; for(size_t i = 0; i < lt_eyes.size(); i++) { for( size_t j = 0; j < rt_eyes.size(); j++) { if(E[i][j] > best_E) { best_i = i; best_j = j; best_E = E[i][j]; } } } cout << "\tSelected valid left and right eyes." << endl; this->lt_eye = lt_eyes[best_i]; //cout << "\tThe left eye is at " << lt_eye.x << " and " << lt_eye.y << endl; this->rt_eye = rt_eyes[best_j]; //cout << "\tThe right eye is at " << rt_eye.x << " and " << rt_eye.y << endl; delete[] E; return true; } // Choose a valid nose based on shortest Euclidean distance to the bottom-center of the picture Rect PigClassifier::chooseNose(vector<Rect> noses) { double shortest_distance=999999999; Point bottom_center = ((double) (this->img.cols/2),(double) this->img.rows);
Texas Tech University, Alexander W. Clark, August 2015
108
for(size_t i = 0; i < noses.size(); i++) { Point nose_center = Point(noses[i].x + noses[i].width/2,noses[i].y + noses[i].height/2); double distance = sqrt(pow((double) bottom_center.x - nose_center.x,2.0) + pow( (double) bottom_center.y - nose_center.y,2.0)); if(distance < shortest_distance) { this->nose = noses[i]; shortest_distance = distance; } } return nose; } // Return the Mat stored with the pig Mat PigClassifier::getImg() { return this->mark_img; } // Calculate the various feature metrics void PigClassifier::calcMetrics() { cout << " Calculating metrics..." << endl; // Make temporary points for eyes Point rightEye = Point((int)face.x+rt_eye.x+rt_eye.width/2,(int)face.y+rt_eye.y+rt_eye.height/2); Point leftEye = Point((int)face.x+lt_eye.x+lt_eye.width/2,(int)face.y+lt_eye.y+lt_eye.height/2); Point nosePt = Point((int)nose.x+nose.width/2,(int)nose.y+nose.height*3/4); this->bisector = Point((int)(rightEye.x+leftEye.x)/2,(int)(rightEye.y+leftEye.y)/2); // Calculate eye and nose angle from bisector this->eye_angle = (double) atan((double)(rightEye.y-this->bisector.y)/(rightEye.x-this->bisector.x)); this->nose_angle = (double) atan((double)(nosePt.x-this->bisector.x)/(nosePt.y-this->bisector.y)); // Calculate distance from eye to eye and bisector to nose this->eye_dist = (double) sqrt(pow((double)rightEye.y-leftEye.y,2) + pow((double)rightEye.x-leftEye.x,2)); this->nose_dist = (double) sqrt(pow((double)nosePt.x-this->bisector.x,2) + pow((double)nosePt.y-this->bisector.y,2)); // Calculate eye size average this->eye_avg = (lt_eye.width + rt_eye.width)/2; // Calculate eye size delta this->eye_del = abs(lt_eye.width - rt_eye.width); } // Mark image with facial geometries void PigClassifier::markImage() { // Mark valid detected face
Texas Tech University, Alexander W. Clark, August 2015
109
rectangle( this->mark_img, Point(face.x,face.y), Point(face.x+face.width,face.y+face.height), Scalar(200,0,0), 5, 8, 0 ); // Mark valid detected eyes rectangle( this->mark_img, Point(face.x+lt_eye.x,face.y+lt_eye.y), Point(face.x+lt_eye.x + lt_eye.width,face.y+lt_eye.y + lt_eye.height), Scalar( 200,0,0 ), 5, 8, 0 ); rectangle( this->mark_img, Point(face.x+rt_eye.x,face.y+rt_eye.y), Point(face.x+rt_eye.x + rt_eye.width,face.y+rt_eye.y + rt_eye.height), Scalar( 200,0,0 ), 5, 8, 0 ); // Mark valid detected nose rectangle( this->mark_img, Point(nose.x,nose.y), Point(nose.x+nose.width,nose.y+nose.height), Scalar(200,0,0), 5, 8, 0 ); // Make temporary points for eyes Point rightEye = Point((int)face.x+rt_eye.x+rt_eye.width/2,(int)face.y+rt_eye.y+rt_eye.height/2); Point leftEye = Point((int)face.x+lt_eye.x+lt_eye.width/2,(int)face.y+lt_eye.y+lt_eye.height/2); Point nosePt = Point((int)nose.x+nose.width/2,(int)nose.y+nose.height/2); // Draw circles pinpointing the eyes circle( this->mark_img, leftEye, 4, Scalar(100,0,0), 25, 8); circle( this->mark_img, rightEye, 4, Scalar(100,0,0), 25, 8); // Draw line between eyes line( this->mark_img, leftEye, rightEye, Scalar(100,0,0), 8, 8); // Draw circle pinpointing the nose circle( this->mark_img, nosePt, 4, Scalar(100,0,0), 25, 8); //Draw line between bisector and nose line( this->mark_img, nosePt, this->bisector, Scalar(100,0,0), 8, 8); } // calculate the perspective transformation of the face void PigClassifier::calcTransformation() { Point rightEye = Point((int)face.x+rt_eye.x+rt_eye.width/2,(int)face.y+rt_eye.y+rt_eye.height/2); Point leftEye = Point((int)face.x+lt_eye.x+lt_eye.width/2,(int)face.y+lt_eye.y+lt_eye.height/2); Point forehead = Point(this->bisector.x-sin(this->nose_angle)*this->nose_dist*0.4,this->bisector.y-cos(this->nose_angle)*nose_dist*0.4); Point ltFace, rtFace, rtNose, ltNose; //rtFace = Point((int) face.x + rt_eye.x + rt_eye.width*3/2, (int) face.y + rt_eye.y - rt_eye.height*2); //ltFace = Point((int) face.x + lt_eye.x - lt_eye.width/2, (int) face.y + lt_eye.y - lt_eye.height*2); double eye_delta = (rt_eye.width - lt_eye.width)/2; rtFace = Point((int) forehead.x + cos(this->eye_angle)*(this->eye_dist*3/4+eye_delta), (int) forehead.y + sin(this->eye_angle)*(this->eye_dist*3/4+eye_delta)); ltFace = Point((int) forehead.x - cos(this->eye_angle)*(this->eye_dist*3/4-eye_delta), (int) forehead.y - sin(this->eye_angle)*(this->eye_dist*3/4-eye_delta));
Texas Tech University, Alexander W. Clark, August 2015
110
ltNose = Point((int) nose.x+nose.width/2-cos(this->eye_angle)*(nose.width*3/8),(int) nose.y + nose.height/2 - sin(this->eye_angle)*(nose.width/2)); rtNose = Point((int) nose.x+nose.width/2+cos(this->eye_angle)*(nose.width*3/8),(int) nose.y + nose.height/2 + sin(this->eye_angle)*(nose.width/2)); // Draw quadrilateral around plane to be transformed circle( this->mark_img, rtFace, 4, Scalar(100,0,100), 25, 8); circle( this->mark_img, ltFace, 4, Scalar(100,0,100), 25, 8); circle( this->mark_img, ltNose, 4, Scalar(100,0,100), 25, 8); circle( this->mark_img, rtNose, 4, Scalar(100,0,100), 25, 8); line( this->mark_img, rtFace, ltFace, Scalar(100,0,100), 8, 8); line( this->mark_img, ltFace, ltNose, Scalar(100,0,100), 8, 8); line( this->mark_img, ltNose, rtNose, Scalar(100,0,100), 8, 8); line( this->mark_img, rtNose, rtFace, Scalar(100,0,100), 8, 8); Point2f src[4], dst[4]; src[0] = Point2f(rtFace); src[1] = Point2f(ltFace); src[2] = Point2f(ltNose); src[3] = Point2f(rtNose); dst[0] = Point2f(Point(300,0)); dst[1] = Point2f(Point(0,0)); dst[2] = Point2f(Point(0,600)); dst[3] = Point2f(Point(300,600)); // Take perspective transformation with bicubic interpolation Mat M(2, 4, CV_32FC1); M = getPerspectiveTransform( src, dst); warpPerspective( this->img, this->face_img, M, Size(300,600),INTER_CUBIC); cvtColor(this->face_img, this->face_img, CV_BGR2GRAY); equalizeHist(this->face_img,this->face_img); } // Return the appropriate feature vector for the facial geometry vector<double> PigClassifier::getFeatVect() { // Note: all features are in relation to center of face as origin and normalized to face.width vector<double> feat_vect; // Store the average eye size feat_vect.push_back(this->eye_avg); // Store the difference in eye size feat_vect.push_back(this->eye_del); // Store the distance between the eyes feat_vect.push_back(this->eye_dist); // Position of nose feat_vect.push_back(this->nose.x); feat_vect.push_back(this->nose.y); // Store the known weight of the pig for supervised data feat_vect.push_back(weight_known);
Texas Tech University, Alexander W. Clark, August 2015
111
return feat_vect; } // Return static variables summary void PigClassifier::getResults() { cout << endl; cout << "*************************************************************" << endl; cout << " Results" << endl; cout << "*************************************************************" << endl; cout << "Misclassifications = " << misclassified << " / " << images_classified << endl; cout << " Misclassification error = " << (double) misclassified/images_classified << endl; cout << "Total rejected images = " << failed_from_face + failed_from_eyes + failed_from_nose << " / " << images_classified << endl; cout << " Failures from faces = " << failed_from_face << " / " << images_classified << endl; cout << " Failures from eyes = " << failed_from_eyes << " / " << images_classified << endl; cout << " Failures from noses = " << failed_from_nose << " / " << images_classified << endl; cout << "*************************************************************" << endl; } // Return cropped face Mat PigClassifier::getCroppedFace() { return this->face_img; } // Prints the cluster data on a vector of pig faces void PigClassifier::getClusters(vector<PigFace> & pig_faces) { // Check if the vector has elements if(pig_faces.empty()) { return; } cout << endl << "Pig facial cluster information:" << endl; // loop through all pig faces for(size_t i = 0; i < pig_faces.size(); i++) { cout << "Image " << i << endl; cout << " ID known: " << pig_faces[i].ID_known << endl; cout << " ID prediction: " << pig_faces[i].ID_prediction << endl; cout << " Prediction confidence: " << pig_faces[i].cluster_confidence << endl; cout << endl; } } // Creates clusters out of the pig faces
Texas Tech University, Alexander W. Clark, August 2015
112
vector<int> PigClassifier::createClusters(vector<PigFace> & pig_faces, int confidence_threshold, int min_cluster_count, int radius, int neighbors, int grid_x, int grid_y, int iterations, bool training) { ofstream out_file; out_file.open ("recognizer_output.txt",ios::app); out_file << "**************************************************" << endl; out_file << "Recognizer Run" << endl; out_file << "**************************************************" << endl; out_file << "threshold = " << confidence_threshold << endl; out_file << "min_cluster_count = " << min_cluster_count << endl; out_file << "radius = " << radius << endl; out_file << "neighbors = " << neighbors << endl; out_file << "grid_x = " << grid_x << endl; out_file << "grid_y = " << grid_y << endl; out_file << "iterations = " << iterations << endl; out_file << "training = " << training << endl; namedWindow( "Transformed Image", CV_WINDOW_AUTOSIZE ); // Check if the vector has elements vector<int> training_data; if(pig_faces.empty()) { return training_data; } double threshold = DBL_MAX; Ptr<FaceRecognizer> model = createLBPHFaceRecognizer(radius, neighbors, grid_x, grid_y, threshold); vector<Mat> images; vector<int> labels; cout << "Initializing FaceRecognizer..." << endl; int predicted_label = -1; double predicted_confidence = 0.0; for(int k = 0; k < iterations; k++) { cout << "Iteration #" << k << endl; out_file << "Iteration #" << k << endl; images.clear(); labels.clear(); //random_shuffle(pig_faces.begin(),pig_faces.end()); // Initialize model with first face image images.push_back(pig_faces[0].getFaceImg()); labels.push_back(0); model->train(images, labels); for(size_t i = 0; i < pig_faces.size(); i++) { if (!training) { cout << "Updating pig # " << i << endl; } model->predict(pig_faces[i].getFaceImg(), predicted_label, predicted_confidence); imshow("Transformed Image", pig_faces[i].getFaceImg()); waitKey(1); predicted_confidence = 100 - predicted_confidence;
Texas Tech University, Alexander W. Clark, August 2015
113
pig_faces[i].cluster_confidence = predicted_confidence; if(predicted_confidence > confidence_threshold) { // Confidence is good enough for facial match pig_faces[i].ID_prediction = predicted_label; pig_faces[i].cluster_confidence = predicted_confidence; pig_faces[i].membership.push_back(predicted_label); if(!training) { cout << "We'll call it a match." << endl; } } else { // Create a new label images.push_back(pig_faces[i].getFaceImg()); labels.push_back(labels.size()); model->update(images,labels); if(!training) { cout << "Let's make a new label." << endl; } i--; if(labels.size() > 20 && training) { training_data.push_back(confidence_threshold); training_data.push_back(min_cluster_count); training_data.push_back(radius); training_data.push_back(neighbors); training_data.push_back(grid_x); training_data.push_back(grid_y); training_data.push_back(-1); return training_data; } } } // Sort all of the faces based on pig ID and weight sort(pig_faces.begin(), pig_faces.end()); cout << endl; out_file << endl; for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction != -1) { cout << pig_faces[j].ID_prediction << ": " << "ID_known = " << pig_faces[j].ID_known << endl; out_file << pig_faces[j].ID_prediction << ": " << "ID_known = " << pig_faces[j].ID_known << endl; } else { cout << j << ": " << pig_faces[j].ID_prediction << ": " << endl; out_file << j << ": " << pig_faces[j].ID_prediction << ": " << endl; } } cout << endl; out_file << endl; } //create dynamic array of cluster counts
Texas Tech University, Alexander W. Clark, August 2015
114
int *cluster_count = new int[labels.size()]; for(size_t i = 0; i < labels.size(); i++) { cluster_count[i] = 0; } // Count the number of faces in each cluster for(size_t i = 0; i < labels.size(); i++) { for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction == i) { cluster_count[i]++; } } } // Delete all clusters that are smaller than the given threshold //int offset = 0; for(size_t i = 0; i < labels.size(); i++) { for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction == i) { if(cluster_count[i] < min_cluster_count) { pig_faces.erase(pig_faces.begin() + j); j--; cluster_count[i] = 0; } } } } // Store only clusters that are big enough vector<Mat> temp_images; vector<int> temp_labels; for(size_t i = 0; i < labels.size(); i++) { if(cluster_count[i] >= min_cluster_count) { temp_images.push_back(images[i]); temp_labels.push_back(labels[i]); } } images.clear(); labels.clear(); for(size_t i = 0; i < temp_labels.size(); i++) { images.push_back(temp_images[i]); labels.push_back(temp_labels[i]); } // Recount the number of faces in each cluster cluster_count = new int[labels.size()]; for(size_t i = 0; i < labels.size(); i++) { cluster_count[i] = 0; } for(size_t i = 0; i < labels.size(); i++) { for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction == i) { cluster_count[i]++; } } }
Texas Tech University, Alexander W. Clark, August 2015
115
// Cycle through, taking away a cluster at a time Mat temp_image; int temp_label; for(size_t i = 0; i < labels.size(); i++) { bool moving_flag = false; vector<int> mode_list; double average; temp_image = images[0]; images.erase(images.begin()); temp_label = labels[0]; labels.erase(labels.begin()); model->train(images, labels); cout << "Omitting Cluster " << temp_label << "..." << endl; out_file << "Omitting Cluster " << temp_label << "..." << endl; int last_ID=-1; for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction != last_ID) { imshow("Transformed Image", pig_faces[j].getFaceImg()); waitKey(1); if(last_ID != -1) { average = average/mode_list.size(); cout << "Average = " << average << endl; out_file << "Average = " << average << endl; sort(mode_list.begin(),mode_list.end()); int mode = 0; int mode_count = 0; int largest_mode_count = 0; int last_ID_value = -1; for(size_t k = 0; k < mode_list.size(); k++) { if(mode_list[k] == last_ID_value) { mode_count++; } else { mode_count = 1; } if(mode_count > largest_mode_count) { mode = mode_list[k]; largest_mode_count = mode_count; } last_ID_value = mode_list[k]; } cout << "Mode = " << mode << " with " << largest_mode_count << " instances." << endl; out_file << "Mode = " << mode << " with " << largest_mode_count << " instances." << endl; if(last_ID == temp_label) { // Determine if we should move ommitted cluster to target cluster (75%) if((double) largest_mode_count/mode_list.size() > .75) { // Calculate target cluster average
Texas Tech University, Alexander W. Clark, August 2015
116
double target_average = 0; int target_size = 0; for(size_t t = 0; t < pig_faces.size(); t++) { if(pig_faces[t].ID_prediction == mode) { target_average += pig_faces[t].weight_est; target_size++; } } target_average = target_average/target_size; // Move cluster to target cluster if within 10% of estimated weight if(abs(target_average - average)/target_average < 0.1 && abs(target_average - average)/average < 0.1) { moving_flag = true; for(size_t t = 0; t < pig_faces.size(); t++) { if(pig_faces[t].ID_prediction == temp_label) { pig_faces[t].ID_prediction = mode; } } sort(pig_faces.begin(), pig_faces.end()); cout << "Moved Cluster " << temp_label << " to be joined with Cluster " << mode << endl; out_file << "Moved Cluster " << temp_label << " to be joined with Cluster " << mode << endl; } else { cout << "Not moving Cluster " << temp_label << " because it's est weight is " << average << " but the target cluster is " << target_average << endl; out_file << "Not moving Cluster " << temp_label << " because it's est weight is " << average << " but the target cluster is " << target_average << endl; } } else { cout << "Not moving Cluster " << temp_label << " because only " << ((double) largest_mode_count/mode_list.size()*100) << "% of cluster is the mode." << endl; out_file << "Not moving Cluster " << temp_label << " because only " << ((double) largest_mode_count/mode_list.size()*100) << "% of cluster is the mode." << endl; } } } // Start looking at a new cluster mode_list.clear(); average = 0;
Texas Tech University, Alexander W. Clark, August 2015
117
cout << "Cluster " << pig_faces[j].ID_prediction << endl; out_file << "Cluster " << pig_faces[j].ID_prediction << endl; } if(pig_faces[j].ID_prediction == temp_label) { model->predict(pig_faces[j].getFaceImg(), predicted_label, predicted_confidence); // FIXME I think I could choose to only run on the cluster being omitted (AKA ID_prediction = temp_label) predicted_confidence = 100 - predicted_confidence; if(pig_faces[j].ID_known != -1) { cout << "1st: " << pig_faces[j].ID_prediction << ",2nd: " << predicted_label << "(" << predicted_confidence << "%)" << ", Actual = " << pig_faces[j].ID_known << ", Weight = " << pig_faces[j].weight_est << endl; out_file << "1st: " << pig_faces[j].ID_prediction << ",2nd: " << predicted_label << "(" << predicted_confidence << "%)" << ", Actual = " << pig_faces[j].ID_known << ", Weight = " << pig_faces[j].weight_est << endl; } else { cout << "1st: " << pig_faces[j].ID_prediction << ",2nd: " << predicted_label << "(" << predicted_confidence << "%)" << ", Weight = " << pig_faces[j].weight_est << endl; out_file << "1st: " << pig_faces[j].ID_prediction << ",2nd: " << predicted_label << "(" << predicted_confidence << "%)" << ", Weight = " << pig_faces[j].weight_est << endl; } mode_list.push_back(predicted_label); average += pig_faces[j].weight_est; } last_ID = pig_faces[j].ID_prediction; } cout << endl; out_file << endl; if(moving_flag == false) { images.push_back(temp_image); labels.push_back(temp_label); } cout << endl; out_file << endl; } // for every cluster vector<double> temp_cluster; for(size_t i = 0; i < labels.size(); i++) { cout << "Cluster " << i << endl; // dump data into vector temp_cluster.clear(); for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction == labels[i]) { temp_cluster.push_back(pig_faces[j].weight_est); } } while(cluster_count[i] >= 3) {
Texas Tech University, Alexander W. Clark, August 2015
118
// calculate mean and standard deviation double mean = 0, s = 0; for(size_t j = 0; j < temp_cluster.size(); j++) { mean += temp_cluster[j]; } mean = mean/temp_cluster.size(); for(size_t j = 0; j < temp_cluster.size(); j++) { s += (temp_cluster[j] - mean)*(temp_cluster[j] - mean); } s = sqrt(s/(temp_cluster.size()-1)); // iterate through and find index of farthest outlier int farthest_index=0; double farthest_value=0; for(size_t j = 0; j < temp_cluster.size(); j++) { if(temp_cluster[j] > farthest_value) { farthest_index = j; farthest_value = temp_cluster[j]; } } // using that index,check if outlier double t = abs(temp_cluster[farthest_index] - mean)/s; cout << "mean = " << mean << "s = " << s << "t = " << t << endl; // if it is outlier, delete from cluster and repeat if((temp_cluster.size() == 3 && t > 1.153) || (temp_cluster.size() == 4 && t > 1.463) || (temp_cluster.size() == 5 && t > 1.672) || (temp_cluster.size() == 6 && t > 1.822) || (temp_cluster.size() == 7 && t > 1.938) || (temp_cluster.size() == 8 && t > 2.032) || (temp_cluster.size() == 9 && t > 2.110) || (temp_cluster.size() >= 10 && temp_cluster.size() < 15 && t > 2.176) || (temp_cluster.size() >= 15 && temp_cluster.size() < 20 && t > 2.409) || (temp_cluster.size() >= 20 && temp_cluster.size() < 25 && t > 2.557) || (temp_cluster.size() >= 25 && temp_cluster.size() < 50 && t > 2.663) || (temp_cluster.size() >= 50 && temp_cluster.size() < 100 && t > 2.956) || (temp_cluster.size() >= 100 && t > 3.207)) { cout << "Erasing outlier" << endl; temp_cluster.erase(temp_cluster.begin() + farthest_index); } else { // if not an outlier, can move on to the next cluster cout << (cluster_count[i] - temp_cluster.size()) << " outliers deleted." << endl; cout << "Cluster average without outliers = " << mean << endl << endl; break; }
Texas Tech University, Alexander W. Clark, August 2015
119
} } // Create text file of cluster info for sampling program ofstream cluster_file; cluster_file.open ("cluster_info.csv"); // Print clusters and calculate average weight vector<String> file_names; vector<int> weight_known; vector<int> weight_est; for(size_t i = 0; i < labels.size(); i++) { cluster_count[i] = 0; // need to recalculate now that outliers are gone double average_weight = 0; double average_confidence = 0; int ID_average = 0; cout << "Cluster " << i << " contains:" << endl; out_file << "Cluster " << i << "contains:" << endl; for(size_t j = 0; j < pig_faces.size(); j++) { if(pig_faces[j].ID_prediction == labels[i]) { if(cluster_count[i] == 0 ) { // If we're looking at the first image, push onto the stack file_names.push_back(pig_faces[j].file_name); weight_known.push_back(pig_faces[j].weight_known); } if(!training) { if(pig_faces[j].weight_known != -1 && pig_faces[j].ID_known != -1) { cout << " " << pig_faces[j].ID_known << ": known=" << pig_faces[j].weight_known << "lbs, est=" << pig_faces[j].weight_est << " lbs, confidence = " << pig_faces[j].cluster_confidence << endl; out_file << " " << pig_faces[j].ID_known << ": known=" << pig_faces[j].weight_known << "lbs, est=" << pig_faces[j].weight_est << " lbs, confidence = " << pig_faces[j].cluster_confidence << endl; } else { cout << j << ": est=" << pig_faces[j].weight_est << " lbs, confidence = " << pig_faces[j].cluster_confidence << endl; out_file << j << ": est=" << pig_faces[j].weight_est << " lbs, confidence = " << pig_faces[j].cluster_confidence << endl; } } average_weight += pig_faces[j].weight_est; average_confidence += pig_faces[j].cluster_confidence; ID_average += pig_faces[j].ID_known; cluster_count[i]++; } }
Texas Tech University, Alexander W. Clark, August 2015
120
cout << "Average weight of cluster = " << average_weight/cluster_count[i] << " lbs" << endl; average_confidence = (average_confidence-100)/(cluster_count[i]-1); cout << "Average cluster confidence = " << average_confidence << "%" << endl; weight_est.push_back((int) (average_weight/cluster_count[i]+0.5)); cout << endl; cluster_file << labels[i] << "," << (int) (average_weight/cluster_count[i]+0.5) << endl; } delete[] cluster_count; // Pushing the various parameters if we're training the recognizer if(training) { training_data.push_back(confidence_threshold); training_data.push_back(min_cluster_count); training_data.push_back(radius); training_data.push_back(neighbors); training_data.push_back(grid_x); training_data.push_back(grid_y); training_data.push_back(labels.size()); } out_file.close(); cluster_file.close(); model->save("face_rec_model.xml"); // Display all of the clusters in the dynamically sizing window displayClusters(file_names, weight_known, weight_est); return training_data; } void PigClassifier::displayClusters(vector<String> & file_names, vector<int> weight_known, vector<int> weight_est) { vector<Mat> images; if(file_names.size() != weight_known.size() && weight_known.size() != weight_est.size()) { cout << "Error in displayCluster: Vectors not all the same size." << endl << endl; return; } for(size_t i = 0; i < file_names.size(); i++) { cout << "Reading from file: " << file_names[i] << endl; Mat img = imread(file_names[i]); // Check for invalid input if(!img.data) { cout << "Error in displayCluster: Could not open or find the image of pig face." << endl << endl;
Texas Tech University, Alexander W. Clark, August 2015
121
return; } else { images.push_back(img); } } // Set up the multi-image window size_t x_max = 3; int x_count, y_count; if(images.size() < x_max) { x_count = images.size(); } else { x_count = x_max; } y_count = (images.size()/x_max) + 1; int dstWidth = images[0].cols * x_count; int dstHeight = images[0].rows * y_count; Mat dst = Mat(dstHeight, dstWidth, CV_8UC3, cv::Scalar(0,0,0)); // Draw text on each image and output to large window for(int i = 0; i < (int) images.size(); i++) { // Only output known weight if it was specified String text; if(weight_known[i] != -1) { text = "Est: " + to_string((long long) weight_est[i]) + " lbs, Known: " + to_string((long long) weight_known[i]) + " lbs"; } else { text = "Est: " + to_string((long long) weight_est[i]) + " lbs"; } putText(images[i],text,Point(images[0].cols/15,images[0].rows*7/8),FONT_HERSHEY_SIMPLEX,5,Scalar(0,200,200), 10); Rect roi(Rect((i%x_count)*images[0].cols,(i/x_count)*images[0].rows,images[0].cols, images[0].rows)); Mat targetROI = dst(roi); images[i].copyTo(targetROI); } // Create window and show image on it namedWindow( "Clustering Final Results", CV_WINDOW_NORMAL ); imshow("Clustering Final Results", dst); waitKey(0); } void PigClassifier::samplePig(PigFace piggie) { // Read data from CSV file on previously clustered pigs ifstream cluster_file; cluster_file.open("cluster_info.csv"); vector<int> labels; vector<int> weights; string line;
Texas Tech University, Alexander W. Clark, August 2015
122
if(cluster_file.is_open()) { while(cluster_file.good() ) { // Get the file name getline(cluster_file, line, ','); if(line == "") { break; } labels.push_back(stoi(line)); getline(cluster_file, line); weights.push_back(stoi(line)); } } else { cout << "Failed to open cluster_info.csv" << endl; cout << "Press any key to terminate program." << endl; waitKey(0); return; } // Load the Recognizer model from previously clustered data Ptr<FaceRecognizer> model = createLBPHFaceRecognizer(); model->load("face_rec_model.xml"); // Predict the label of the photograph int predicted_label = -1; double predicted_confidence = -1; model->predict(piggie.getFaceImg(), predicted_label, predicted_confidence); // Open the image being tested Mat img = imread(piggie.file_name); // Output the results for(int i = 0; i < labels.size(); i++) { if(labels[i] == predicted_label) { String text = "Est: " + to_string((long long) weights[i]) + " lbs"; putText(img,text,Point(img.cols/15,img.rows*7/8),FONT_HERSHEY_SIMPLEX,5,Scalar(0,200,200), 10); cout << endl << "The pig most likely weighs " << weights[i] << " lbs" << endl; // Create window and show image on it namedWindow( "Clustering Final Results", CV_WINDOW_NORMAL ); imshow("Clustering Final Results", img); waitKey(0); } } }
Texas Tech University, Alexander W. Clark, August 2015
123
PigFace.h
#ifndef PIGFACE_DEF #define PIGFACE_DEF #include <opencv2/core/core.hpp> #include <opencv2/objdetect/objdetect.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <iostream> #include <stdio.h> #include <string> #include <math.h> #include "utilities.h" using namespace std; using namespace cv; class PigFace { public: double weight_known; double weight_est; int ID_known; int ID_prediction; double cluster_confidence; String file_name; vector<int> membership; PigFace(); PigFace(Mat face_img); PigFace(Mat face_img, int ID_known, String file_name, double weight_known, double weight_est); int setIDPrediction(int ID_prediction); Mat getFaceImg(); private: Mat face_img; }; bool operator<(const PigFace &face1, const PigFace &face2); #endif //if not defined
Texas Tech University, Alexander W. Clark, August 2015
124
PigFace.cpp
#include "PigFace.h" // Define all constructors PigFace::PigFace() { this->ID_known = -1; this->weight_known = 0; this->weight_est = 0; this->ID_prediction = -1; cluster_confidence = 0; file_name = ""; } PigFace::PigFace(Mat face_img) { this->face_img = face_img; this->ID_known = -1; this->weight_known = 0; this->weight_est = 0; this->ID_prediction = -1; cluster_confidence = 0; file_name = ""; } PigFace::PigFace(Mat face_img, int ID_known, String file_name, double weight_known, double weight_est) { this->face_img = face_img; this->ID_known = ID_known; this->weight_known = weight_known; this->weight_est = weight_est; this->ID_prediction = -1; cluster_confidence = 0; this->file_name = file_name; } Mat PigFace::getFaceImg() { return this->face_img; } // Need to define this operator so that the faces can be sorted bool operator<(const PigFace &face1, const PigFace &face2) { if(face1.ID_prediction == face2.ID_prediction) { if(face1.weight_est < face2.weight_est) { return true; } else { return false; } } else { if(face1.ID_prediction < face2.ID_prediction) { return true; } else { return false; } } }
Texas Tech University, Alexander W. Clark, August 2015
125
utilities.h
#ifndef UTILITIES_DEF #define UTILITIES_DEF #include <string> #include <opencv2/core/core.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <stdio.h> #include <stdarg.h> using namespace cv; // ################################################################################ // # Timer Functions // ################################################################################ // Windows #ifdef _WIN32 #include <Windows.h> inline double get_wall_time(){ LARGE_INTEGER time,freq; if (!QueryPerformanceFrequency(&freq)){ // Handle error return 0; } if (!QueryPerformanceCounter(&time)){ // Handle error return 0; } return (double)time.QuadPart / freq.QuadPart; } inline double get_cpu_time(){ FILETIME a,b,c,d; if (GetProcessTimes(GetCurrentProcess(),&a,&b,&c,&d) != 0){ // Returns total user time. // Can be tweaked to include kernel times as well. return (double)(d.dwLowDateTime | ((unsigned long long)d.dwHighDateTime << 32)) * 0.0000001; }else{ // Handle error return 0; } } // Posix/Linux #else #include <sys/time.h>
Texas Tech University, Alexander W. Clark, August 2015
126
inline double get_wall_time(){ struct timeval time; if (gettimeofday(&time,NULL)){ // Handle error return 0; } return (double)time.tv_sec + (double)time.tv_usec * .000001; } inline double get_cpu_time(){ return (double)clock() / CLOCKS_PER_SEC; } #endif #endif //if not defined