Contrastive Relevance Propagationfor Interpreting Predictions
by a Single-Shot Object Detector
Hideomi Tsunakawa1, Yoshitaka Kameya1,
Hanju Lee2, Yosuke Shinya2, and Naoki Mitsumoto2
1Department of Information Engineering, Meijo University2DENSO CORPORATION
1IJCNN-19
Outline• Background
• Proposed method: CRP
• Experiments
IJCNN-19 2
Outline• Background
• Proposed method: CRP
• Experiments
IJCNN-19 3
Background: SSD (1)• Object detection is a well-known task in computer vision
• SSD (Single-Shot MultiBox Detector) [Liu+ ECCV-16]:– Known for its high speed and accuracy– Outputs:
• Confidences for classes• Location offsets
(center on x-axis, center on y-axis, width, height)
IJCNN-19 4
Input: Output:
Classification
Localization
Background: SSD (2)• SSD:
– Based on a (large) single convolutional network
– Layers for classification and layers for localization areconnected from several convolutional layers→ Different resolutions
IJCNN-19 5
Non-m
axim
um
suppre
ssio
nCls4Loc4
Cls7Loc7
Cls8Loc8
Cls9Loc9
Cls10Loc10
Cls11Loc11
Inputimage
300
300
VGG-16 until Pool5 layer
38
38
Conv4_3
512
19
19
Conv6
1024
19
19
Conv7
1024
10
10
512
Conv8_25
5
Conv9_2
256
Conv10_2
3
3256 256
1
1
Conv11_2
Conv:3x3x1024
Conv:1x1x1024
Conv: 1x1x256Conv: 3x3x512-s2
Conv: 1x1x128Conv: 3x3x256-s2
Conv: 1x1x128Conv: 3x3x256-s1
Conv: 1x1x128Conv: 3x3x256-s1
Classification
Localization
Background: LRP (1)• LRP (Layer-wise Relevance Propagation) [Bach+ 15]:
– Often used for interpreting predictions of DNNs
IJCNN-19 6
Cls4Loc4
Cls7Loc7
Cls8Loc8
Cls9Loc9
Cls10Loc10
Cls11Loc11
300
300
38
38
Conv4_3
512
19
19
Conv6
1024
19
19
Conv7
1024
10
10
512
Conv8_25
5
Conv9_2
256
Conv10_2
3
3256 256
11
Conv11_2
Input:Output:
Background: LRP (1)• LRP (Layer-wise Relevance Propagation) [Bach+ 15]:
– Often used for interpreting predictions of DNNs
– Propagates relevance backward from the outputto the input features
– Creates a heatmap using relevance at the input features
IJCNN-19 7
Cls4Loc4
Cls7Loc7
Cls8Loc8
Cls9Loc9
Cls10Loc10
Cls11Loc11
300
300
38
38
Conv4_3
512
19
19
Conv6
1024
19
19
Conv7
1024
10
10
512
Conv8_25
5
Conv9_2
256
Conv10_2
3
3256 256
11
Conv11_2
Input:Output:
Heatmap: Relevanceto “dog”
Relevance propagation
Background: LRP (2)• LRP is equipped with several propagation rules:
– Common:Rj
(l + 1): distributed to lower units
Ri(l) := Sj Rij
Rij: passed through connection
IJCNN-19 8
Layer l Layer l + 1
Rj(l + 1)
Ri(l) Rij
Background: LRP (2)• LRP is equipped with several propagation rules:
– Common:Rj
(l + 1): distributed to lower units
Ri(l) := Sj Rij
Rij: passed through connection
IJCNN-19 9
Layer l Layer l + 1
Rj(l + 1)
Ri(l) Rij
Background: LRP (2)• LRP is equipped with several propagation rules:
– Common:Rj
(l + 1): distributed to lower units
Ri(l) := Sj Rij
Rij: passed through connection
IJCNN-19 10
Layer l Layer l + 1
Rj(l + 1)
Ri(l) Rij
Background: LRP (2)• LRP is equipped with several propagation rules:
– Common:Rj
(l + 1): distributed to lower units
Ri(l) := Sj Rij
Rij: passed through connection
– Simple LRP:
– -LRP:
– -LRP:
IJCNN-19 11
Layer l Layer l + 1
Rj(l + 1)
Ri(l) Rij
Background: Indistinguishable Heatmaps (1)
• Heatmaps are almost invariant even when the target class has been changed
• Heatmaps obtained with -LRP ( = 1, = 0):
IJCNN-19 12
Target class: “dog”(actually predicted)
Target class: “cat”(“what-if” analysis)
Background: Indistinguishable Heatmaps (2)• Relevance propagated in each layer:
IJCNN-19 13Relevance decreases exponentially
Background: Indistinguishable Heatmaps (3)
• Recent works that seem to support our observation:
– [Adebayo+ NeurIPS-18]:
• Uses Inception v3 (a large network)
• If relevance = gradient input, the input part dominates→ Heatmaps will be invariant
(since the input is of course fixed)
– [Ancona+ ICLR-18]:
• Several methods tend to return similar heatmaps(theoretically or empirically):–Gradient input
–DeepLIFT (Rescale)– Integrated Gradients– Simple LRP
IJCNN-19 14
Background: Our Motivation• We introduce contrastive relevance that highlights
the more important part to the target class
• We design the meaning of relevance to be consistentin two heterogeneous tasks in SSD:
– Classification– Localization (Regression)
IJCNN-19 15
Target class: “dog” Target class: “cat”
Outline✓ Background
• Proposed method: CRP
• Experiments
IJCNN-19 16
Contrastive Relevance Propagation (CRP)
• CRP: LRP tailored for SSD
– Classifies SSD’s layers into 4 types
– Applies semantically appropriate propagation rules toeach layer type
– In both classification and localization, the meanings of“relevance” are the same
IJCNN-19 17
Cls4Loc4
Cls7Loc7
Cls8Loc8
Cls9Loc9
Cls10Loc10
Cls11Loc11
300
300
38
38
Conv4_3
512
19
19
Conv6
1024
19
19
Conv7
1024
10
10
512
Conv8_25
5
Conv9_2
256
Conv10_2
3
3256 256
11
Conv11_2
A detected box
Relevanceto class kof interest
High-Level Feature LayerLow-Level Feature Layers
ClassificationLayer
Contrastive Relevance Propagation (CRP)
• CRP: LRP tailored for SSD
– Classifies SSD’s layers into 4 types
– Applies semantically appropriate propagation rules toeach layer type
– In both classification and localization, the meanings of“relevance” are the same
IJCNN-19 18
Cls4Loc4
Cls7
Loc7Cls8Loc8
Cls9Loc9
Cls10Loc10
Cls11Loc11
300
300
38
38
Conv4_3
512
19
19
Conv6
1024
19
19
Conv7
1024
10
10
512
Conv8_25
5
Conv9_2
256
Conv10_2
3
3256 256
11
Conv11_2
A detected box
Relevanceto shiftingto right
LocalizationLayer
High-Level Feature LayerLow-Level Feature Layers
Contrastive Relevance Propagation (CRP)
• CRP: LRP tailored for SSD
– Classifies SSD’s layers into 4 types
– Applies semantically appropriate propagation rules toeach layer type
– In both classification and localization, the meanings of“relevance” are the same
IJCNN-19 19
Cls4Loc4
Cls7
Loc7
Cls8Loc8
Cls9Loc9
Cls10Loc10
Cls11Loc11
300
300
38
38
Conv4_3
512
19
19
Conv6
1024
19
19
Conv7
1024
10
10
512
Conv8_25
5
Conv9_2
256
Conv10_2
3
3256 256
11
Conv11_2
Another detected box
Relevanceto class k’of interest
ClassificationLayer
High-Level Feature LayerLow-Level Feature Layers
CRP: Propagation Rules in Classification
IJCNN-19 20
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class K
class k
class k*(target)
CRP: Propagation Rules in Classification
IJCNN-19 21
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class k*(target)
class K
class k
InitialRelevance
1
0
0
0
CRP: Propagation Rules in Classification
IJCNN-19 22
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class k*(target)
class K
class k
We use w+-rule( -LRP with = 1, = 0)
to find units that positivelycontribute to class k*
CRP: Propagation Rules in Classification
IJCNN-19 23
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class k*(target)
class K
class k
At this moment, we can compute
a class-specific relevance Ri[k*] for the target class k*by summing up the passed relevance
CRP: Propagation Rules in Classification
IJCNN-19 24
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class k*(target)
class K
class kWe compute contrastive relevance
to find units that make a significantlypositive or a significantly negativecontribution to the target class k*
“average relevance” over other classes
CRP: Propagation Rules in Classification
IJCNN-19 25
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class k*(target)
class K
class k
Until the input layer, we use w+-rule
to distribute the positivity or the negativity of contrastive relevance(activations xi are non-negative due to ReLU)
CRP: Propagation Rules in Classification
IJCNN-19 26
Classificationlayer
High-level featurelayer
Low-level featurelayer
class 1
class k*(target)
class K
class k
Until the input layer, we use w+-rule
to distribute the positivity or the negativity of contrastive relevance(activations xi are non-negative due to ReLU)
CRP: Propagation Rules in Localization
IJCNN-19 27
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
height
CRP: Propagation Rules in Localization
IJCNN-19 28
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
height
InitialRelevance
1
0
0
0
CRP: Propagation Rules in Localization
IJCNN-19 29
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
height
Sign-based rule switching:We switch two rulesaccording to the sign of xj
If xj is positive, use w+-rule( -LRP with = 1, = 0)
to find units that positivelycontribute to center on y-axis
Activationxj
CRP: Propagation Rules in Localization
IJCNN-19 30
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
height
Activationxj
Sign-based rule switching:We switch two rulesaccording to the sign of xj
If xj is negative, use w–-rule( -LRP with = 0, = 1)
to find units that negativelycontribute to center on y-axis
CRP: Propagation Rules in Localization
IJCNN-19 31
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
heightWe compute contrastive relevance
relevance fromthe localization layer “overall average”
class-specific relevance
CRP: Propagation Rules in Localization
IJCNN-19 32
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
heightUntil the input layer, we use w+-rule
as in classification
CRP: Propagation Rules in Localization
IJCNN-19 33
Localizationlayer
High-level featurelayer
Low-level featurelayer
center on x-axis
center on y-axis(target)
width
heightUntil the input layer, we use w+-rule
as in classification
Outline✓ Background
✓ Proposed method: CRP
• Experiments
IJCNN-19 34
Experimental Settings
• Dataset: Pascal VOC 2012
• We ported the TensorFlow implementation of LRP(https://github.com/VigneshSrinivasan10/interprettensor)
into a TensorFlow implementation of SSD(https://github.com/balancap/SSD-Tensorflow)
• SSD implementation includes a learned model(We conducted no learning)
• We added CRP-specific routines
• Relevance was normalized before creating heatmaps
IJCNN-19 35
(See the paper for details)
Numerical Example
• Relevance is almost symmetrically distributed at zero
IJCNN-19 36
0Positives NegativesDifferent Colorsin Heatmap:
Target class: “dog”
Error Analysis (1)
• A dog was misclassified as a sheep
IJCNN-19 37
Error Analysis (2)
• A dog was misclassified as a sheep
IJCNN-19 38
Target class: “dog” Target class: “sheep”
Error Analysis (3)
• A dog was misclassified as a sheep
IJCNN-19 39
Target class: “sheep”<85%tile values masked
Error Analysis (4)
• Unwanted localizations:
– Horizontal shift to left with widening
– Vertical shift to top with heightening
IJCNN-19 40
Before localization After localization
Error Analysis (5)
• Unwanted localizations:
– Horizontal shift to left with widening
– Vertical shift to top with heightening
IJCNN-19 41
Target offset: center on x-axis Target offset: center on y-axis
Error Analysis (6)
• Unwanted localizations:
– Horizontal shift to left with widening
– Vertical shift to top with heightening
IJCNN-19 42
Target offset: width Target offset: height
Summary
• CRP (contrastive relevance propagation) as an LRP method tailored for SSD:
– Can highlight only significantly important featuresfor a target class
– Can deal with SSD’s heterogeneous outputs(classification and localization)
• Some error analyses using CRP were conducted
IJCNN-19 43
• Applying CRP to other object detectors such as YOLO
• Applying CRP (retrospectively) to standard CNNs
Future work
Thank you for your attention!
IJCNN-19 44