Selected aspects of training networks for face …zbum.ia.pw.edu.pl/PRESENTATIONS/ZBUM._WG.pdf ·...

Selected aspects of training networks for face identification WERONIKA GUTFETER

BIOMETRICS AND MACHINE LEARNING GROUP

10.04.2018

Agenda: 1) Face recognition with convolutional neural networks – short introduction

2) Evaluating recognition systems

3) Data selection for training

4) Visualization of model

Recognizing faces (images) with CNN: introduction

Definition of CNN:

Convolutional networks (…) or CNNs, are a specialized kind of neural network for processing data that has a known grid-like topology.

Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.

Recognizing faces (images) with CNN: introduction

Nice 3D visualization of CNN recognizing digits: http://scs.ryerson.ca/~aharley/vis/conv/

http://scs.ryerson.ca/~aharley/vis/conv/

http://scs.ryerson.ca/~aharley/vis/conv/

First excitements about new technology Cumulative match score curve for 4 different algorithms on testing data from ICB-RW (frames from surveillance system with detection coordinates):

a) Open-CV – classic eigenfaces-style algorithm (trained on estimation set from ICB-RW)

b) Neurotechnology VeriLook - commercial toolkit with embedded face detector (cannot force external detection so FTE errors are treated as no identifaction)

b) VGG-Face – typical deep convolutional model trianed to recognize faces (16 layers)

d) my algorithm – 11-layer network (trained on estimation set from ICB-RW)

Face recognition is strongly dependent from detection result (sometime it’s hard to prepare comparable results for algorithms)!

Common problems in real-world face recognition systems •Comparing templates generated from different sources (devices) like: • static images (photography, database entries – often messy)

• surveillance cameras (subject seen from top or at large distance)

• mobile phones (compression, image enhacements)

• body worn cameras

•Various resolution and level of noise in images, occlusions etc.

•No cooperation with subject (data in-the-wild)

What people expect from face recognition systems?

Footage from body-worn cameras that can be used by polish law enforcment officers. Source: materials prepared by the Polish Platform for Homeland Security

https://www.ppbw.pl/en

Evaluation of recognition systems

Example 1: Disguised faces - verification

Data come from the challenge „Disguised faces in-the-wild”

Consists of 3 subsets:

1. Main database - mixed photos of celebrities and non-celebritities (students/employees of the univeristy)

2. Disguised (persons from main database with heavy make-up or clothes and masks occluding face)

3. Persons similar or pretending to be the ones from the main database - impersonators

Example 1: Verification with CNN (main dataset)

False match rate - percent of invalid inputs (pairs from different classes - 1670871 scores ) that are incorrectly accepted

False non match rate – percent of valid inputs (same label of class - 835 scores) that are incorrectly rejected

Descriptors were generated usign VGG-Face descriptor and DPM face detector. Each is output of last fully connected layer of network normalized by L2 and has 4096 elements. Code can be downloaded from: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/.

http://www.robots.ox.ac.uk/~vgg/software/vgg_face/


Example 1: Verification with CNN (all subsets)

False match rate for impersonators - percent of impersonators that are accepted as genuines (7931 pairs genuine-impersonator with the same class label)

False non match rate for impersonators – not computable

False non match rate for disguised – percent of probes with disguised person that are rejected (8057 pairs genuine-impersonator with the same class label)

False match rate for disguised - percent of disguised persons that are accepted as other class from genuine set (8169402 pairs genuines-impostors with distinct class label)

Descriptors were generated usign VGG-Face descriptor and DPM face detector. Each is output of last fully connected layer of network normalized by L2 and has 4096 elements. Code can be downloaded from: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/.

Equal error rate increased from 1,8% to 11,4 % after adding disguised sets and to 12,6% after adding impersonators.



Example 1: Conclusions 1. Samples with occlusions and make-up drastically reduce quality of the verification (obvious)

2. Introducing concept which is very important when training neural networks: 1. Hard-positives: in this case faces which belong to the same class but have extremely different

appearance (disguised set)

2. Hard-negatives: negative examples which are very similar to selected class (impersonators)

3. Incremental changing the difficulty of probes given classiffier is typical procedure during training (from easier to harder)

To what extend the process of training is data-driven?

„Many machine learning novices are tempted to make improvements by trying out many different algorithms. Yet, it is often much better to gather more data than to improve the learning algorithm.”


Example 2: Validation regarding distinct training datasets

Selected 3 subsets from face database collected in NASK (full database contains 7 different subsets, 500.000 probes, 1395 classes):

1) FaceScrub – static images - celebrities

2) Quiscampi – static frames from surveillance system, low quality

3) Quiscampi-Video – frames from video showing rotating head of subjects from Quiscampi dataset, good quality, lot of correlated images


Training ResNet- 50 (residual 50-layer network) model on full NASK dataset – result is average accuracy of classification on all 7 subsets.

Training curve (top-1 accuracy in K epochs of training)


Training ResNet-50 model on full NASK dataset – results are values of accuracy on 3 selected validation subsets.

Training curves (top-1 accuracies in K epochs of training)


What if we transfer our model to some external data (also faces but uknown to system)?

Selected testing scenario: recognition of stadium hooligans (matching between photos from football match and images taken at police station).

42 subjects in gallery


Testing model trained on full NASK dataset. Results are comapared with random identification and identification with randomly initializied model.

Cumulative match curve (percent of correct identification for certain number of guesses)


Training 3 ResNet-50 models on 3 selected distinct subsets of NASK database – result is accuracy of each model on validation set

Training curves (top-1 accuracies in K epochs of training)


Testing models trained on 3 subsets of database NASK-DB.

Model trained on Quiscampi-Video (very correalated data) has very high accuracy on validation set from same-dataset, but is practically near to random on data from testing dataset.

Cumulative match curve (percent of correct identification for certain number of guesses)

Conclusions Nice advice from book

„If large models and carefully tuned optimization algorithms do not work well, then the problem might be the quality of the training data. The data may be too noisy or may not include the right inputs needed to predict the desired outputs. This suggests starting over (!), collecting cleaner data, or collecting a richer set of features.”


Is neural network a do-some-magic-blackbox?

Visualizing what neural networks had learned

Methods of visualization:

1) kernel vizualization (raw weights)

2) dimensionality reduction of penultimate layer (Nearest Neighbors) ◦ PCA

◦ tSNE

3) activation maps ◦ Maximally activated patches

◦ Experiment with occlusions

◦ Guided backpropagation

◦ Gradient ascent

Classification prepared basing on materials from Standford University course CS231n: Convolutional Neural Networks for Visual Recognition, Lecture 12 | Visualizing and Understanding (instructors: Fei-Fei Li, Justin Johnson, Serena Yeung) https://www.youtube.com/watch?v=6wcs6szJWMY&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=12&t=0s

https://www.youtube.com/watch?v=6wcs6szJWMY&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=12&t=0s



Slade from Standford University course CS231n: Convolutional Neural Networks for Visual Recognition, Lecture 12 | Visualizing and Understanding (instructors: Fei-Fei Li, Justin Johnson, Serena Yeung) https://www.youtube.com/watch?v=6wcs6szJWMY&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv&index=12&t=0s




Example 3: Experiment with occluding faces of subject

Goal of the test: what width of occluding rectangle makes face unrecognizable to algorithm?

Preliminary results: database of 4 subjects (5 samples each). Occlusion is grey rectangle hiding eyes with width varying from 1 pixel to 101 pixels.

If we take threshold from previous experiments (see: disguised daces in-the-wild) faces are classified as negatives from 53-pixel wide rectangle.

If we can visualize what neural network had learned can we reverse-engineer it?

Example 4: Fooling convolutional neural networks

Adversarial examples – some machine learning models misclassify examples that are only slightly different from correctly classified examples (in a way unrecognizable for human eye) .

Example 4: Fooling face recognition system with glasses

Copied from paper:

Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. 2016. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition

Examples of successful impersonation and dodging attacks.

Fig. (a) shows SA (top) and SB (bottom) dodging against DNN.

Fig. (b){(d) show impersonations. Impersonators carrying out the attack are shown in the top row and corresponding impersonation targets in the bottom row.

Weronika Gutfeter [email protected]

www.zbum.ia.pw.edu.pl

mailto:[email protected]

http://www.zbum.ia.pw.edu.pl/

Date post:	09-Aug-2018
Category:	Documents
Upload:	duongdang
View:	215 times
Download:	0 times

Selected aspects of training networks for face …zbum.ia.pw.edu.pl/PRESENTATIONS/ZBUM._WG.pdf ·...

Documents