Contrastive Relevance Propagation for Interpreting …...Contrastive Relevance Propagation for...

Contrastive Relevance Propagationfor Interpreting Predictions

by a Single-Shot Object Detector

Hideomi Tsunakawa1, Yoshitaka Kameya1,

Hanju Lee2, Yosuke Shinya2, and Naoki Mitsumoto2

1Department of Information Engineering, Meijo University2DENSO CORPORATION

1IJCNN-19

Outline• Background

• Proposed method: CRP

• Experiments

IJCNN-19 2

Outline• Background


• Experiments

IJCNN-19 3

Background: SSD (1)• Object detection is a well-known task in computer vision

• SSD (Single-Shot MultiBox Detector) [Liu+ ECCV-16]:– Known for its high speed and accuracy– Outputs:

• Confidences for classes• Location offsets

(center on x-axis, center on y-axis, width, height)

IJCNN-19 4

Input: Output:

Classification

Localization

Background: SSD (2)• SSD:

– Based on a (large) single convolutional network

– Layers for classification and layers for localization areconnected from several convolutional layers→ Different resolutions

IJCNN-19 5

Non-m

axim

um

suppre

ssio

nCls4Loc4

Cls7Loc7

Cls8Loc8

Cls9Loc9

Cls10Loc10

Cls11Loc11

Inputimage

300

300

VGG-16 until Pool5 layer

38

38

Conv4_3

512

19

19

Conv6

1024

19

19

Conv7

1024

10

10

512

Conv8_25

5

Conv9_2

256

Conv10_2

3

3256 256

1

1

Conv11_2

Conv:3x3x1024

Conv:1x1x1024

Conv: 1x1x256Conv: 3x3x512-s2




Classification

Localization

Background: LRP (1)• LRP (Layer-wise Relevance Propagation) [Bach+ 15]:

– Often used for interpreting predictions of DNNs

IJCNN-19 6

Cls4Loc4

Cls7Loc7

Cls8Loc8

Cls9Loc9

Cls10Loc10

Cls11Loc11

300

300

38

38

Conv4_3

512

19

19

Conv6

1024

19

19

Conv7

1024

10

10

512

Conv8_25

5

Conv9_2

256

Conv10_2

3

3256 256

11

Conv11_2

Input:Output:

Background: LRP (1)• LRP (Layer-wise Relevance Propagation) [Bach+ 15]:

– Often used for interpreting predictions of DNNs

– Propagates relevance backward from the outputto the input features

– Creates a heatmap using relevance at the input features

IJCNN-19 7

Cls4Loc4

Cls7Loc7

Cls8Loc8

Cls9Loc9

Cls10Loc10

Cls11Loc11

300

300

38

38

Conv4_3

512

19

19

Conv6

1024

19

19

Conv7

1024

10

10

512

Conv8_25

5

Conv9_2

256

Conv10_2

3

3256 256

11

Conv11_2

Input:Output:

Heatmap: Relevanceto “dog”

Relevance propagation

Background: LRP (2)• LRP is equipped with several propagation rules:

– Common:Rj

(l + 1): distributed to lower units

Ri(l) := Sj Rij

Rij: passed through connection

IJCNN-19 8

Layer l Layer l + 1

Rj(l + 1)

Ri(l) Rij


– Common:Rj


Ri(l) := Sj Rij


IJCNN-19 9

Layer l Layer l + 1

Rj(l + 1)

Ri(l) Rij


– Common:Rj


Ri(l) := Sj Rij


IJCNN-19 10

Layer l Layer l + 1

Rj(l + 1)

Ri(l) Rij


– Common:Rj


Ri(l) := Sj Rij


– Simple LRP:

– -LRP:

– -LRP:

IJCNN-19 11

Layer l Layer l + 1

Rj(l + 1)

Ri(l) Rij

Background: Indistinguishable Heatmaps (1)

• Heatmaps are almost invariant even when the target class has been changed

• Heatmaps obtained with -LRP ( = 1, = 0):

IJCNN-19 12

Target class: “dog”(actually predicted)

Target class: “cat”(“what-if” analysis)

Background: Indistinguishable Heatmaps (2)• Relevance propagated in each layer:

IJCNN-19 13Relevance decreases exponentially

Background: Indistinguishable Heatmaps (3)

• Recent works that seem to support our observation:

– [Adebayo+ NeurIPS-18]:

• Uses Inception v3 (a large network)

• If relevance = gradient input, the input part dominates→ Heatmaps will be invariant

(since the input is of course fixed)

– [Ancona+ ICLR-18]:

• Several methods tend to return similar heatmaps(theoretically or empirically):–Gradient input

–DeepLIFT (Rescale)– Integrated Gradients– Simple LRP

IJCNN-19 14

Background: Our Motivation• We introduce contrastive relevance that highlights

the more important part to the target class

• We design the meaning of relevance to be consistentin two heterogeneous tasks in SSD:

– Classification– Localization (Regression)

IJCNN-19 15

Target class: “dog” Target class: “cat”

Outline✓ Background


• Experiments

IJCNN-19 16

Contrastive Relevance Propagation (CRP)

• CRP: LRP tailored for SSD

– Classifies SSD’s layers into 4 types

– Applies semantically appropriate propagation rules toeach layer type

– In both classification and localization, the meanings of“relevance” are the same

IJCNN-19 17

Cls4Loc4

Cls7Loc7

Cls8Loc8

Cls9Loc9

Cls10Loc10

Cls11Loc11

300

300

38

38

Conv4_3

512

19

19

Conv6

1024

19

19

Conv7

1024

10

10

512

Conv8_25

5

Conv9_2

256

Conv10_2

3

3256 256

11

Conv11_2

A detected box

Relevanceto class kof interest

High-Level Feature LayerLow-Level Feature Layers

ClassificationLayer






IJCNN-19 18

Cls4Loc4

Cls7

Loc7Cls8Loc8

Cls9Loc9

Cls10Loc10

Cls11Loc11

300

300

38

38

Conv4_3

512

19

19

Conv6

1024

19

19

Conv7

1024

10

10

512

Conv8_25

5

Conv9_2

256

Conv10_2

3

3256 256

11

Conv11_2

A detected box

Relevanceto shiftingto right

LocalizationLayer







IJCNN-19 19

Cls4Loc4

Cls7

Loc7

Cls8Loc8

Cls9Loc9

Cls10Loc10

Cls11Loc11

300

300

38

38

Conv4_3

512

19

19

Conv6

1024

19

19

Conv7

1024

10

10

512

Conv8_25

5

Conv9_2

256

Conv10_2

3

3256 256

11

Conv11_2

Another detected box

Relevanceto class k’of interest

ClassificationLayer


CRP: Propagation Rules in Classification

IJCNN-19 20

Classificationlayer

High-level featurelayer

Low-level featurelayer

class 1

class K

class k

class k*(target)


IJCNN-19 21

Classificationlayer



class 1

class k*(target)

class K

class k

InitialRelevance

1

0

0

0


IJCNN-19 22

Classificationlayer



class 1

class k*(target)

class K

class k

We use w+-rule( -LRP with = 1, = 0)

to find units that positivelycontribute to class k*


IJCNN-19 23

Classificationlayer



class 1

class k*(target)

class K

class k

At this moment, we can compute

a class-specific relevance Ri[k*] for the target class k*by summing up the passed relevance


IJCNN-19 24

Classificationlayer



class 1

class k*(target)

class K

class kWe compute contrastive relevance

to find units that make a significantlypositive or a significantly negativecontribution to the target class k*

“average relevance” over other classes


IJCNN-19 25

Classificationlayer



class 1

class k*(target)

class K

class k

Until the input layer, we use w+-rule

to distribute the positivity or the negativity of contrastive relevance(activations xi are non-negative due to ReLU)


IJCNN-19 26

Classificationlayer



class 1

class k*(target)

class K

class k

Until the input layer, we use w+-rule

to distribute the positivity or the negativity of contrastive relevance(activations xi are non-negative due to ReLU)

CRP: Propagation Rules in Localization

IJCNN-19 27

Localizationlayer



center on x-axis

center on y-axis(target)

width

height


IJCNN-19 28

Localizationlayer



center on x-axis


width

height

InitialRelevance

1

0

0

0


IJCNN-19 29

Localizationlayer



center on x-axis


width

height

Sign-based rule switching:We switch two rulesaccording to the sign of xj

If xj is positive, use w+-rule( -LRP with = 1, = 0)

to find units that positivelycontribute to center on y-axis

Activationxj


IJCNN-19 30

Localizationlayer



center on x-axis


width

height

Activationxj

Sign-based rule switching:We switch two rulesaccording to the sign of xj

If xj is negative, use w–-rule( -LRP with = 0, = 1)

to find units that negativelycontribute to center on y-axis


IJCNN-19 31

Localizationlayer



center on x-axis


width

heightWe compute contrastive relevance

relevance fromthe localization layer “overall average”

class-specific relevance


IJCNN-19 32

Localizationlayer



center on x-axis


width

heightUntil the input layer, we use w+-rule

as in classification


IJCNN-19 33

Localizationlayer



center on x-axis


width

heightUntil the input layer, we use w+-rule

as in classification

Outline✓ Background

✓ Proposed method: CRP

• Experiments

IJCNN-19 34

Experimental Settings

• Dataset: Pascal VOC 2012

• We ported the TensorFlow implementation of LRP(https://github.com/VigneshSrinivasan10/interprettensor)

into a TensorFlow implementation of SSD(https://github.com/balancap/SSD-Tensorflow)

• SSD implementation includes a learned model(We conducted no learning)

• We added CRP-specific routines

• Relevance was normalized before creating heatmaps

IJCNN-19 35

(See the paper for details)

Numerical Example

• Relevance is almost symmetrically distributed at zero

IJCNN-19 36

0Positives NegativesDifferent Colorsin Heatmap:

Target class: “dog”

Error Analysis (1)

• A dog was misclassified as a sheep

IJCNN-19 37

Error Analysis (2)


IJCNN-19 38

Target class: “dog” Target class: “sheep”

Error Analysis (3)


IJCNN-19 39

Target class: “sheep”<85%tile values masked

Error Analysis (4)

• Unwanted localizations:

– Horizontal shift to left with widening

– Vertical shift to top with heightening

IJCNN-19 40

Before localization After localization

Error Analysis (5)




IJCNN-19 41

Target offset: center on x-axis Target offset: center on y-axis

Error Analysis (6)




IJCNN-19 42

Target offset: width Target offset: height

Summary

• CRP (contrastive relevance propagation) as an LRP method tailored for SSD:

– Can highlight only significantly important featuresfor a target class

– Can deal with SSD’s heterogeneous outputs(classification and localization)

• Some error analyses using CRP were conducted

IJCNN-19 43

• Applying CRP to other object detectors such as YOLO

• Applying CRP (retrospectively) to standard CNNs

Future work

Thank you for your attention!

IJCNN-19 44

Date post:	25-May-2020
Category:	Documents
Upload:	others
View:	14 times
Download:	0 times

Contrastive Relevance Propagation for Interpreting …...Contrastive Relevance Propagation for...

Documents