Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 215 times |
Download: | 1 times |
INFORMATION REPRESENTATION AND
COMPRESSION
Our approach in TUT:We do not know how to describe locations of blocks so....
Let’s think first about GLOBAL cotnent description in which locations are not considered!
That is look first into the problem in which onlyblock STATISTICS is considered
(we were illustrating on CAMSHIFT that colorstatistics gives good results)
Impact of Quantization
Distribution of DCT coefficients for typical 8x8 DCT blockWe can see that higher frequency coefficients are small.
If we use strong quantization they will be quantized to zero.
Under strong quantization only first 4x4 block of
coefficients will be nonzero. This is equivalent to
4x4 DCT transform.
There is another effect too:
The greater the quantization the smaller the number
of DIFFERENT blocks.
In fact, with no quantization, every block is different
Quantization is rounding the coefficients to limited
number of values.
Coefficients of the 4x4 blocks
DC AC ..... ...
AC .... ..... .....
DC – zero frequency,average light level inthe blockAC – correspond to different frequencies
Quantization by QP
[DC]=round[DC/QP][AC]=round[AC/QP]
Higher QP -> more zeros in the block
Here is an illustration for a picture
QP is quantization parameter, we see that as itis increasing the number of DCT patterns is reduced stronlgy
Now we use the following idea:
Let’s see how the histogram of the quantized DCT blocks looks!
For example, let’s find which blocks appear most often in a picture and create histogram of e.g. first 40 patterns
The shape of this histogram obviously depends on
the quantization. If the quantization is low, the
histogram will tend to be flat. If the quantization
is high it will tend to have a peak.
Let us see example of histograms for two pictures
Histograms of two face images
The database retrieval problem based on block histograms
Assume we have database D of pictures 1,2,..i,,j..mWe take a picture and want to check if it is in the database or if there are similar pictures there. Example: database of passport photographs.In our approach we will use the similarity measurebetween pictures based on their quantized histograms Histograms are treated as vectors and similarityis based on the following formula:
m
kji kHkH
1
)()(Bi,j= i,j єD
The measure is city-block measure (differences betweenabsolute values of coefficients) and it achives minimum value = 0. Then two histogram vectors should be identical. The closer the value to zero the more similarpictures should be. Remember that blocks are quantizedso noise and nonrelevant features are removed.
The question is what is the performance of such scheme but before we can check this, we need to look into the light normalization problem.
Light normalization problem
The values of DCT transform coefficients depend on the light level. If the light level is higher the values arehigher. If we use the same quantization for two identicalpictures with different light levels the quantized blocks will be different.Light level can be normalized. First, let’s calculateaverage light level for a picture. For this we use values of DC coefficients in blocks
)(1
)(1
jDCN
jDCN
iimean
Here we get average light level for a picture
Average light level DCall in a database is calculated in the same way based on values of DCmean for each picture. Next, the values of light level for each picture are rescaled by the factor of
)( jDC
DCR
mean
all
Rescaling makes that the values of coefficients in the quantized blocks will be similar:
MjNiRDCTDCT jiji 1,1,,,
MjNiQP
DCTDCT ji
ji 1,1,,,
The DC coefficients problem
At high quantization levels very many blocks will haveonly DC coefficients. Information about these blockswill be only DC that is what ist the average light levelin the block.
But of interest is how the average light level is changing between the blocks. We want to use this information.
What we make is that we will account for the informationin the differences between DC values in neighbouring blocks.
DC differences between blocks
In a) we see fragment of a picture in which DC valuesof the blocks are shown. For each block we have 8neighbours like shown in b). We calculate 9 differencesbetween the neighbours (8 for directions and 1 for the average from all directions) as shown in c). Now we order the differences and form a vector from first k coefficientsas shown in d) for k=4
Combined histogram
A combined histogram for AC blocks and DC vectorsis now formed
H =[ HAC , α xHDC ]
where α is a numerical parameter which will be optimizedlater.
Combined histogram means that we have two vectorsfor minimizing and they are summed with parameter α
m
kji kHkH
1
)()(Bi,j= i,j єD
Optimization of database retrievalThe question is: How good can be the database retrievalbased on combined histogram? This means e.g. how many errors it will be made.
But we can also ask another question: What is the bestachievable performance of this approach?
Remember that we use only statistical information but we have several parameters which can be selected: - quantization level - size of histograms - parameter α for combining histograms - size of DC difference vectors
Optimization procedureWe can check this problem taking some databases and optimizing the parameters for best retrieval. This will show us what is the maximum performance. We did thisfor face databases using the following scheme:
EVALUATION OF RESULTS
Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A.
The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR.
The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.
Equal Error Rate
The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR. From the FAR and FRR, an Equal Error Rate (EER) is achieved when both measures take equal values. The lower the EER is, the better is the system's performance, as the total error rate which is the sum of the FAR and the FRR at the point of the EER decreases.
Typical performance of EER histogram for two face databases
DATABASE SELECTION
There are two cases:1. Database in which there is only one (standard) picture of each person 2. Database in which there are multiple pictures of each person (and they might very different)
In case 2. the same person should be retrieved for any ofits pictures which can be difficult.
DATABASES SELECTED
The GTF (Georgia Tech Face) database contains the face images of 50 people, from both male and female, each with 15 images. Most of the images were taken in two different sessions to account for the variations in illumination conditions, facial expression, appearance, different scales and orientations. For test, we store the first 11 images of each person in the database and the remaining 4 images serve as key images for retrieval. Therefore, the total number of stored images is 550 and the total number of key images is 200.
DATABASES SELECTED
The ORL (Olivetti Research Laboratory) database contains 10 different images of 40 persons. Images were taken at different times, with slightly varying lighting, various facial expressions (open/closed eyes, smiling/non-smiling) and facial details (glasses/no-glasses). The ORL has thus more variations for images taken from one person. For experiment, we store the first 6 images of each person in the database and the remaining 4 images serve as key images. Therefore, the total number of stored images is 240 and the total number of key images is 160.
RESULTS
We present results for AC only, for DC only and for combined histogram
AC-Patterns HistogramsDirection-Vectors
HistogramsCombined Histogram
EER - ORL 1.25% 3.125% 0.625%
EER - GTF 7% 7% 4.5%
The best result of ORL is obtained when: QP_AC=36, number of AC patterns=80, QP_DC=75, number of Direction-Vector patterns = 300 and α=0.7, γ=7. The best result of GTF is obtained when: QP_AC=10, number of AC patterns = 250, QP_DC = 20, number of Direction-Vector patterns = 400 and α=0.9, γ=5.
EVALUATION OF RESULTS
Given certain classification threshold, an input face image of person A may be falsely classified to person B. If the target person is person A.
The ratio of how many images of person A have been classified into other persons is called False Rejection Rate, FRR.
The ratio of how many images of other persons have been classified into person A is called False Acceptance Rate, FAR.
ANOTHER DATABASE
The FERET database contains overall more than 10,000 images from more than 1000 individuals taken in largely varying circumstances. The FERET database images are divided into several sets which are formed to match its methodology of evaluation. Here we made a test based on the sets fa and fb. In both of them, each face has one picture with picture in fb taken seconds after the corresponding picture in fa. The fa set which has size of 994 images and serves as the database, the fb set which has sizes of 992 images, is used as key images for retrieval from the fa.
EVALUATION OF RESULTS
FERET is considered difficult database used in evaluation of professional applications:
AC-PatternsDirection-
VectorsCombined Histogram
EER 4.6371% 7.06% 3.43%
The best EER result is obtained when: QP_AC = 12, number of AC patterns = 400, QP_DC=12, number of Direction-Vector patterns = 400 and α=0.5, γ=4.
FERET METHODOLOGYOF EVALUATION
For FERET there is another methodology based on calculation of how many correct retrievals will be obtained among n trials, n=1,2,…,3.
FERET EVALUATIONFERET evaluation is called cumulative match score.Results are seen for histogram (red) and is overlaid with other known good methods. Rank means how manyretrievals are made, one retrieval is most demanding.
• Features based on Binary Feature Vectors
For each non-border 4x4 image block, there are eight blocks surrounding it. Such a 3x3 block matrix is utilized here to generate a Binary Feature Vector (BFV). Taking the DC coefficients as an example: the nine DC coefficients within this area form a 3x3 DC coefficient matrix. By measuring and thresholding the magnitude of differences between the non-center DC’s and the central DC coefficient, a binary vector length 8 is formed.
Two different cases are considered here:Case1:
0 – current coefficient ≤ threshold1 – current coefficient > threshold
Case2: 0 – current coefficient < threshold1 – current coefficient ≥ threshold
Example
• DC-BFV Histogram (based on DC coeff.)
• AC-BFV Histogram (based on AC coeff.
Example of DC-BFV histogram
• Performance results for the Feret database
Result is quite good if we take into account that the method uses statistical information only
WHICH IS THE BEST METHOD?
On the FERET plot we see the best performance 95%.Which method it is?
It is called EIGENFACES and it is based on calculation of eigenvectors and eigenevalues of matrices.
EIGENFACES
1. Construction of Face Space
Suppose a face image consists of N pixels, so it can be represented by a vector of dimension N. Let be the training set of face images. The average face of these M images is given by
Then each face differs from the average face by :
EIGENFACESNow covariance matrix of the training images can be constructed:
where
The basis vectors of the face space, i.e., the eigenfaces, are then the orthogonal eigenvectors of the covariance matrix .
The number of training images is usually less than the number of pixels in an image, there will be only M-1, instead of N, meaningful eigenvectors
Eigenvalues, eigenvectors
x is eigenvector for matrix A ís eigenvalue
B = SAS-1
If S is an nonsingular nxn matrix then matrix B has the sameeigenvalues
nxn matrix has n eigenvalues
EIGENFACES
Therefore, the eigenfaces are computed by first finding the eigenvectors, , of the M by M matrix L:
The eigenvectors, , of the matrix are then expressed by a linear combination of the difference face images, , weighted by :
In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification. Hence, only M' significant eigenvectors of L, corresponding to the largest M' eigenvalues, are selected for the eigenface computation
Thus further data compression can be obtained. M' is determined by a threshold, , of the ratio of the eigenvalue summation:
In the training stage, the face of each known individual, , is projected into the face space and an M'-dimensional vector, , is obtained:
where is the number of face classes
A distance threshold, , that defines the maximum allowable distance from a face class as well as from the face space, is set up by computing half the largest distance between any two face classes:
In the recognition stage, a new image, , is projected into the face space to obtain a vector, :
The distance of to each face class is defined by
For the purpose of discriminating between face images and non-face like images, the distance, , between the original image, , and its reconstructed image from the eigenface space, , is also computed:
where
These distances are compared with the threshold given in equation (8) and the input image is classified by the following rules: •IF THEN input image is not a face image; •IF AND THEN input image contains an unknown face; •IF AND THEN input image contains the face of individual .
EXPERIMENTAL RESULTS
The eigenface-based face recognition method was tested on the ORL face database. 150 images of 15 individuals, were selected for experiments.
EXPERIMENTAL RESULTSIn the training stage, three images of each individual were used as the training samples, forming a training set totalling 45 images
The average face of the training set
EXPERIMENTAL RESULTS
The first 15 eigenfaces corresponding to the 15 largest eigenvalues.
EXPERIMENTAL RESULTS
Recognition rate depends on training images – when single view images are used for training recognitionis much worse
Recognition rate
EXPERIMENTAL RESULTSFaces with calm expressions in the training stage and faces of the same individual but with various expressions in the testing stage
Training images
Test images
lower imagesare projectionsin the face space
CONCLUSIONS
Eigenfaces method treat images globally, no localinformation is used. Compression is done on global level. The method requires lots of computationsbut results are good. Explanation of good results:images are represented as combinations of ”simple” imagesand the system is trained on them.