Summary
1 / 39
Video-Based Face Spoofing
Detection through Visual Rhythm
Analysis
Allan S. Pinto1 Helio Pedrini1 WilliamSchwartz 2 Anderson Rocha1
1Institute of ComputingUniversity of Campinas
2Department of Computer ScienceUniversidade Federal de Minas Gerais
XXV SIBGRAPI - Conference on Graphics,Patterns and Images
Summary
2 / 39
Summary
1 Introduction and Motivation
2 Contributions
3 Related Work
4 Proposed Method
5 Experiments
6 Results
7 Conclusion and Future Work
8 Acknowledgment
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
3 / 39
Introduction and Motivation
What is biometrics?
Technology to recognize human of automatic andunique mannerFingerprint, geometric and vein of the hand, face,iris, voice, etc.
Recent advances in the area of pattern recognitionapply in face recognition
Access control, surveillance and criminalidentification, etc.
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
4 / 39
Introduction and Motivation
However, several attack techniques have beendeveloped to deceive the biometric systems
Attacks can occur:
By manipulation of scores of the recognitionsystemWhen a person tries to masquerade as someoneelse falsifying the biometric data that are capturedby the acquisition sensor
Spoofing Attack
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
5 / 39
Introduction and Motivation
But in practice, what is easier? Manipulation ofthe scores or present a biometric fake data foracquisition sensor?
showing a photography of a valid usershowing a video of a valid usershowing a 3D facial model of a valid user
Our face is the biometric data more exposed
Download on Facebook (photo), YouTube (video),Personal Website (photo)
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
6 / 39
Contributions
First method proposed for the video-basedspoofing attack detection
Creation of a dataset (available upon acceptance a)composed of 700 videos
100 videos of valid access600 videos of fake access attemptsAll videos with 640× 480 pixel resolution and 25fps
a http://www.ic.unicamp.br/∼rocha/pub/communications.html
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
7 / 39
Contributions
Creation of the robust and simple method that canbe easy embedded in a biometric system inoperation
Can execute parallel to recognition system,requiring less time to validate access
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
8 / 39
Related Work
There are many works to solve the photo-basedspoof attack detection
The methods seek to find differences between areal and fake biometric data
Based on attribute of the images as texture, color,light reflection, optical flow analysis, among othersTopic quite explored
Competition on counter measures to 2-D facial spoofingattacks
In this competition, we were the second best groupof researchers in the world, with only one missclassification a
aW. R. Schwartz, A. Rocha, and H. Pedrini, “Face Spoofing Detection through
Partial Least Squares and Low-Level Descriptors,” in Intl. Joint Conference onBiometrics, Oct. 2011, pp. 1–8.
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
9 / 39
Related Work
We can categorize currents anti-spoofing methodsin four non-disjoint groups
Data-driven characterizationUser behavior modelingUser interaction needPresence of additional devices
Non-intrusive methods without extra devices andhuman involvement may be preferable
Could be easily integrated into an existingbiometric system, where usually only a genericwebcam is deployed
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
10 / 39
Proposed Method
MotivationThere are artifacts that are added to the biometricsamples during the viewing process of the videosin the display devices
Distortion, flickering, moiring, among others
There are noise signatures that are added duringthe recapture process
Our hypothesis is that both noise and artifacts aresufficient to detect the face liveness
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
11 / 39
Overview
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
12 / 39
Step One
Firstly, we calculate the noise residual video(Vnoise) for all videos in training set
Filtering Process
V(t)noise = V (t) − f(V (t)
copy) ∀ t ∈ T = {1, 2, . . . , t},(1)
where V (t) ∈ N2 is the t-th frame of V and f a filteringoperation.
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
13 / 39
Step Two
Next, we calculate the Fourier spectrum in thelogarithmic range and with origin at the center ofthe frame of all noise residual video (Vnoise)
2D Discrete Fourier Transform
F(υ, ν) =
M−1∑x=0
N−1∑y=0
V(noise)(x, y)e−j2π[(υx/M)+(νy/N)] (2)
Fourier Spectrum
|F(υ, ν)| =√
R(υ, ν)2 + I(υ, ν)2
S(υ, ν) = log(1 + |F(υ, ν)|) (3)
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
14 / 39
Step Two
Example of Fourier spectrum video frame
(a) Valid video
(b) Attack video consid-ering a Gaussian filter
(c) Attack video consid-ering a Median filter
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
15 / 39
Step Three
We calculate visual rhythms of each Fourierspectrum video
Visual Rhythm is a technique that can capture thetemporal information and summarize the videocontents in a singe image
Considering a video V in the domain 2D + t with tframes of dimension M ×N pixels, the visualrhythm is a simplification of the video V
lines or columns of each frame t are sampled andconcatenated to form a new image, called visualrhythm
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
16 / 39
Step Three
Example of a visual rhythm
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
17 / 39
Step Three
Visual Rhythm
Two types of visual rhythm is generate for eachvideo
Vertical visual rhythm, formed by the centralvertical linesHorizontal visual rhythm, formed by the centralhorizontal lines;
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
18 / 39
Step Three
Example of horizontal visual rhythms (rotated in 90degrees)
(d) Valid video (e) Attack attempt video
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
19 / 39
Step Three
Example of vertical visual rhythms
(f) Valid video (g) Attack attempt video
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
20 / 39
Step Four
Visual Rhythm as a Texture Map
Gray-level co-occurrence matrices (GLCM) toextract textural information of the visual rhythmA GLCM is a structure that describes thefrequency of occurrence of gray levels betweenpairs of pixels at a distance d = 1 in a givenorientation θ ∈ {0◦, 45◦, 90◦, 135◦}
We extract 12 measures summarizing texturalinformation from four matrix
angular second moment, contrast, correlation, sumof squares, inverse difference moment, ...
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
21 / 39
Step Four
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
22 / 39
Step Four
angular second moment:∑G−1
i=0
∑G−1i=0 p(i, j)2
correlation:∑G−1
i=0
∑G−1i=0
ijp(i,j)−µxµy
σxσy
contrast:∑G−1
i=0
∑G−1i=0 (i− j)2p(i, j)
...
where p is the hd,θ matrix normalized
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
23 / 39
Step Five
Machine Learning
We use two machine learning technique for classifythe patterns that are extracted from the visualrhythms using the texture descriptor GLCM
Partial Least Square (PLS)Support Vector Machine (SVM)
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
24 / 39
Dataset Creation
Extension upon of Print-Attack Dataset
200 videos of valid access200 videos of spoof attacks using printedphotographsAll videos with 320× 240 pixel resolution
Creation of Attack attempt video
All videos that represent a valid access were upsample to 640× 480 pixel resolutionShown in 6 monitors and captured with a SonyCyberShot digital camera
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
25 / 39
Dataset Partitioning
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
26 / 39
What is the Influence of the Monitors?
To verify the influence of the monitors under ourmethod we performed the experiments as follow:
Train with Real 1 and Fake 1 groups and test withReal 2 and Fake 2 groupsTrain with Real 2 and Fake 2 groups and test withReal 1 and Fake 1 groups
Finally, we calculate the average and standarddeviation
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
27 / 39
Analysis of the Filtering Process and
Visual Rhythm
Filtering process analysis
We use either Gaussian or Median filter (linearand non-linear filter, respectively) in the filteringprocess
Median with size of 3× 3Gaussian with σ = 2 and size of 3× 3
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
28 / 39
Analysis of the Filtering Process and
Visual Rhythm
Visual rhythm analysis
The visual rhythms were calculate using the first 2seconds (50 frames)
Vertical visual rhythm: 30 columns of pixelsHorizontal visual rhythm: 30 rows of pixels
We did experiments using the horizontal andvertical visual rhythms separated and combined
Table 1: Number of features (dimensions) using either thedirect pixel intensities as features or the GLCM-basedtexture information features.
Descriptor DimensionalityNome
Horizontal Vertical Horizontal + Vertical
Pixel Intensity 960,000 720,000 1,680,000
GLCM 48 48 96
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
29 / 39
Classification Techniques
Partial Least Square (PLS)
We did experiments considering different numberof factors (the only parameter of this method)
Support Vector Machine (SVM)
K(xi, xj) = xTi xj (Linear kernel)
K(xi, xj) = eγ||xixj ||2 , γ > 0 (RBF kernel)Grid Search for tuning the parameter C and γ
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
30 / 39
Results
Table 2: Obtained results in terms of area under the receiveroperating characteristic curve (AUC) considering the SVMclassification technique and Gaussian filter. SVM was notable to calculate a classification hyperplane when usingdirect pixel intensities as features.
Type of Visual SVM Linear SVM RBF
Rhythms Intensity GLCM Intensity GLCM
– x = 98.4% – x = 99.9%Vertical
– σ = 1.60% – σ = 0.10%
– x = 99.6% – x = 99.7%Horizontal
– σ = 0.50% – σ = 0.10%
Horizontal – x = 100.0% – x = 100.0%
and Vertical – σ = 0.0% – σ = 0.0%
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
31 / 39
Results
Table 3: Obtained results in terms of AUC considering theSVM classification technique and Median filter. SVM wasnot able to calculate a classification hyperplane when usingdirect pixel intensities as features.
Type of Visual SVM Linear SVM RBF
Rhythms Intensity GLCM Intensity GLCM
– x = 99.7% – x = 99.6%Vertical
– σ = 0.20% – σ = 0.10%
– x = 99.9% – x = 100.0%Horizontal
– σ = 0.10% – σ = 0.0%
Horizontal – x = 100.0% – x = 100.0%
and Vertical – σ = 0.0% – σ = 0.0%
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
32 / 39
Results
Table 4: Obtained results in terms of AUC considering thePLS classification technique and Gaussian filter.
Type of Visual PLS
Rhythm Intensity GLCM
x = 99.9% x = 98.2%Vertical
σ = 0.20% σ = 0.40%
x = 100.0% x = 98.9%Horizontal
σ = 0.0% σ = 1.50%
Horizontal x = 100.0% x = 99.9%
and Vertical σ = 0.0% σ = 0.10%
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
33 / 39
Results
Table 5: Obtained results in terms of AUC considering thePLS classification technique and Median filter.
Type of Visual PLS
Rhythm Intensity GLCM
x = 100.0% x = 99.5%Vertical
σ = 0.0% σ = 0.70%
x = 100.0% x = 99.9%Horizontal
σ = 0.0% σ = 0.10%
Horizontal x = 100.0% x = 100.0%
and Vertical σ = 0.0% σ = 0.0%
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
34 / 39
Summary
The visual rhythm calculated on a logarithmicscale Fourier Spectrum
Effective alternative to summarize videos and animportant forensic signature for detectingvideo-based spoofing
The filtering process do not have influence in ourmethod
The obtained results using the Median andGaussian filter are statistically comparable
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
35 / 39
Summary
The monitors do not have influence in our method
Although the standard deviation showed in Table 2 is1.60% and 0.50% using vertical and horizontal visualrhythms, respectively
The combination of these features resulted in a perfectclassification (100.0%± 0.0)
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
36 / 39
Summary
The monitors do not have influence in our method
Although the standard deviation showed in Table 4 is1.50% and 0.40% using vertical and horizontal visualrhythms, respectively
The combination of these features resulted in a nearlyperfect classification (99.9%± 0.10%)
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
37 / 39
Conclusion and Future Work
Fourier spectrum of video noise signatures and theuse of visual rhythms
Able to properly capture discriminativeinformation to distinguish between valid and fakeusers for video-based spoofing
The extraction of feature descriptors with GLCMprovided a compact representation while keepingthe method discriminability
Many classification techniques have memoryallocation problems when dealing withhigh-dimensional feature spaces
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
38 / 39
Conclusion and Future Work
Finally, directions for future work include
The exploration of new video summarizationapproaches as well the use of more monitors andreal videosAdditional tests could be performed consideringtablets and smart phonesThe investigation of illumination influences on theproposed methodNew experiments upon a new Dataset (Videos inFull High Definition quality)
Introduction andMotivation
Contributions
Related Work
Proposed Method
Experiments
Results
Conclusion andFuture Work
Acknowledgment
39 / 39
Acknowledgment