The Impact of Visual Saliency Prediction in Image Classification
1Eric Arazo Sánchez Kevin McGuinness Eva Mohedano Xavier Giró-i-Nieto
Advisors:
Introduction - Computer vision
2
ClassifierHandcrafted descriptors “guitar”
ClassifierLearned descriptors
Trainable
Trainable
Classical computer
vision
Deep Learning “guitar”
Introduction - Imagenet
3
Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge”. International Journal of Computer Vision (2015).
Imagenet
4
Images:
● 1.2 M train
● 50,000 test
● 1,000 categories
Evaluation dataset unpublished before the
competition
Imagenet
5
Metrics:
● Top-1 accuracy
● Top-5 accuracy
Imagenet
6
Metrics:
● Top-1 accuracy
● Top-5 accuracy
Introduction - Imagenet
7
ILSVRC - Evolution since 2010
Slide credit: Kaiming He (FAIR)
Introduction - Imagenet
8
ILSVRC - Evolution since 2010
Slide credit: Kaiming He (FAIR)
Some models have already reached
human-level performance.
Still the olympic games of computer
vision?
Introduction - Imagenet
9Slide credit: Kaiming He (FAIR)
-9.4%2012
Introduction of the Convolutional Neural
Networks (CNN) in the competition with AlexNet
ILSVRC - Evolution since 2010
Introduction - AlexNet
10
Ref: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. NIPS 2012.
Introduction - AlexNet
11
5 Convolutional
Layers
3 Fully Connected
Layers
1000 softmax
Object class
Introduction - CNN
12LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
Introduction - CNN
13LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
CNN are very useful in computer vision:
● Reduction of parameters (shared filters)
● Spatial coherence
Introduction - CNN
14
Image captioning Image segmentation
Introduction - CNN
15
Saliency prediction
Introduction - Saliency prediction
16
CNN model
Images
Saliency maps
Introduction - Saliency prediction
17
CNN for image classification
Objective
18
● Explore if saliency maps could improve other computer vision tasks
Objective
19
● Explore if saliency maps could improve computer vision tasks
Objective
20
● Explore if saliency maps could improve computer vision tasks
Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work
21
State-of-the-art - Saliency prediction
22
SalNet
Pan, Junting and McGuinness, Kevin and Sayrol, Elisa and Giro-i-Nieto, Xavier and O'Connor, Noel E. Shallow and Deep Convolutional Networks for Saliency Prediction. CVPR 2016.
Trained on SALICON
Saliency prediction
23
Application of saliency:
Saliency prediction
24
Application of saliency:
● In image retrieval
○ Finding the last appearance of an object.
Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)
Saliency prediction
25
Application of saliency:
● In image retrieval
○ Finding the last appearance of an object.
● Object recognition
○ Health care
Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)
Ref: Pérez de San Roman, Philippe et al. Saliency Driven Object recognition in egocentric videos with deep CNN. 2016
Saliency prediction - our approach
26
Saliency prediction - our approach
27
AlexNet*SalNet
Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work
28
Methodology
29
RGB images
30
RGB images
RGB - The Baseline
31
RGB images
RGB - The Baseline
● 1.2 M images
● 227 x 227
● 1.2 M images
● 227 x 227
32
RGB images
RGB - The Baseline
9 days to train on computation
cluster
RGB - The Baseline
33
RGB - The Baseline
34
9 days
5 days
RGB - The Baseline
35
9 days
5 days
1.5 days
How to introduce saliency predictions?
36
Multiplication
Fan-in Network
Concatenation
37
AlexnetMultiplication
Fan-in Network
Concatenation
Alexnet
How to introduce saliency predictions?
38
Multiplication
Fan-in Network
Concatenation
Alexnet
Alexnet
How to introduce saliency predictions?
39
Multiplication
Fan-in Network
Concatenation
Alexnet
Alexnet
Alexnet
CNN
How to introduce saliency predictions?
40
Multiplication
Fan-in Network
ConcatenationWhere?
Alexnet
Alexnet
Alexnet
CNN
How to introduce saliency predictions?
41
Multiplication
Fan-in Network
Concatenation
Alexnet
Alexnet
Alexnet
CNN
How to introduce saliency predictions?
42
Alexnet
Alexnet
Alexnet
CNN
Makes sense to use the baseline, which is already trained
Multiplication
Fan-in Network
Concatenation
How to introduce saliency predictions?
43
Alexnet
Alexnet
Alexnet
CNN
Makes sense to use the baseline, which is already trained
Multiplication
Fan-in Network
Concatenation
Pre-trained CNN
How to introduce saliency predictions?
Multiplication vs. Concatenation
44
Three strategies for each of them:
Multiplication vs. Concatenation
45
Three strategies for each of them:
RGBS
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
46
Three strategies for each of them:
RGB-1S-2SRGBS
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
47
Three strategies for each of them:
RGBS RGB-1S-2S RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
48
RGBSRGBS
RGBS
RGB-1S-2S
RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
49
RGBSRGBS
RGBS
RGB-1S-2S
RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
50
RGB-1S-2S
RGBS
RGB-1S-2S
RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
51
RGB-1S-2S
RGBS
RGB-1S-2S
RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
52
RGBS-1S-2S
RGBS
RGB-1S-2S
RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
53
RGBS-1S-2S
RGBS
RGB-1S-2S
RGBS-1S-2S
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Multiplication vs. Concatenation
54
The best option is concatenation:
● RGBS
● RGB-1S-2S
55
Multiplication
Fan-in Network
Concatenation
How to introduce saliency predictions?
56
Multiplication
Fan-in Network
Concatenation
How to introduce saliency predictions?
57
RGBS
RGB-1S-2S
Multiplication
Fan-in Network
Concatenation
How to introduce saliency predictions?
58
RGBS
RGB-1S-2S
Multiplication
Fan-in Network
Concatenation
How to introduce saliency predictions?
59
Alexnet
CNN
RGBS
RGB-1S-2S
Multiplication
Fan-in Network
Concatenation
How to introduce saliency predictions?
60
Alexnet
CNN
RGBS
RGB-1S-2S
Multiplication
Fan-in Network
Concatenation
Where?
How to introduce saliency predictions?
Fan-in architecture
61
Three strategies:
Fan-in C1.1
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Fan-in architecture
62
Three strategies:
Fan-in C1.1 Fan-in C2.1
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 2Batch Norm.Max-Pooling
Fan-in architecture
63
Three strategies:
Fan-in C1.1 Fan-in C2.1 Fan-in C2
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 2Batch Norm.Max-Pooling
Conv 1
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Fan-in architecture
64
Fan-in C1.1
Fan-in C1.1
Fan-in C2.1
Fan-in C2
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Fan-in architecture
65
Fan-in C1.1
Fan-in C1.1
Fan-in C2.1
Fan-in C2
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Fan-in architecture
66
Fan-in C1.1
Fan-in C2.1
Fan-in C2
Fan-in C2.1
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 2Batch Norm.Max-Pooling
Fan-in architecture
67
Fan-in C1.1
Fan-in C2.1
Fan-in C2
Fan-in C2.1
Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 2Batch Norm.Max-Pooling
Fan-in architecture
68
Fan-in C1.1
Fan-in C2.1
Fan-in C2
Fan-in C2
Conv 1
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Fan-in architecture
69
Fan-in C1.1
Fan-in C2.1
Fan-in C2
Fan-in C2
Conv 1
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Fan-in architecture
70
The best option is concatenation:
● Fan-in C2.1
● Fan-in C2
Fan-in architecture
71
The best option is concatenation:
● Fan-in C2.1
● Fan-in C2
Surprising result for Fan-in C2 since it
has less parameters than the baseline
More experiments
12.4%
RGB-C2 (128x128)
72
Fan-in C2Fan-in Network
RGB-C2 (128x128)
73
Fan-in C2Fan-in Network
RGB-C2 (128x128)
74
RGB-C2RGB (baseline)
Fan-in C2Fan-in Network
75
RGB-C2 (128x128)
RGB (baseline)
Fan-in Network
RGB-C2
Fan-in C2
76
Multiplication
Fan-in Network
ConcatenationRGBS
RGB-1S-2S
How to introduce saliency predictions?
77
Multiplication
Fan-in Network
ConcatenationRGBS
RGB-1S-2S
Fan-in C2.1
Fan-in C2
How to introduce saliency predictions?
Analysis of per-class improvements
78
Fan-in C2.1
Fan-in C2
RGBS
RGB-1S-2S
Multiplication
Fan-in Network
Concatenation
Analysis of per-class improvements
79
Fan-in C2.1
Fan-in C2
RGBS
RGB-1S-2S
Multiplication
Fan-in Network
Concatenation
Analysis of per-class improvements
80
Class Increase of accuracy
Acoustic guitar
25 %
Volleyball 23 %
81
Analysis of per-class improvementsClass Increase of accuracy
Wrecker, tow car
-23 %
Entertainment center
-18 %
Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work
82
● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification
83
Conclusions
● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification
84
Conclusions
Fan-in Network
● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification
85
Conclusions
Fan-in Network
● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps
86
Conclusions
● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps
87
Conclusions
Fan-in C2.1Conv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
Conv 1Batch Norm.Max-Pooling
Conv 2Batch Norm.Max-Pooling
Fan-in NetworkConcatenation
RGBSConv 1
Conv 2
Conv 3Conv 4Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Batch Norm.
Max-Pooling
Max-Pooling
Max-Pooling
RGBSaliency
● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps
88
Conclusions
● The methodology of downsampling the images provides accurate results on the improvements of the CNN in larger images
89
Conclusions
227 x 227
128 x 128
Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work
90
Future work
91
● Several experiments:○ Fan-in:
■ Fan-in C2 without saliency maps
■ Concatenating instead of multiplying
○ Concatenation only in the first convolutional layer
○ Multiplication and training from scratch
● Once we have a reasonable model try with other saliency models
Future work
92
● Several experiments:○ Fan-in:
■ Fan-in C2 without saliency maps
■ Concatenating instead of multiplying
○ Concatenation only in the first convolutional layer
○ Multiplication and training from scratch
● Once we have a reasonable model try with other saliency models
Thank you