+ All Categories
Home > Documents > Explaining model predictions with Shapley values + conditional...

Explaining model predictions with Shapley values + conditional...

Date post: 12-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
28
www.nr.no Explaining model predictions with Shapley values + conditional inference trees R package September 3 rd , 2020 Annabelle Redelmeier & Martin Jullum
Transcript
Page 1: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

www.nr.no

Explaining model predictions withShapley values + conditional inference trees R package

September 3rd, 2020

Annabelle Redelmeier & Martin Jullum

Page 2: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Explanation problem

► Suppose you have a black-box model predicts the price of

car insurance based on some features.

► How can we explain the prediction of a black-box model to

a customer?

2

Features

Age

Gender

Type of car

# accidents

Time since car

registered

black box

model

Prediction

$123 /

month

Janet

Page 3: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Explanation problem

► One way to explain a black-box model is to show how the

features contribute to the overall prediction (i.e $123).

33

0 100 150123

Feature 2

Feature 1

Feature 4

Feature 3

Feature 5

► To calculate these contributions, we can use

Shapley values.

Page 4: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Why are explanations important?

► Engineer/scientist making the model: Are there problems

with the model? Edge cases? Biases?

► Society: Is the model fair? Legality?

► Individual: do I trust the model prediction/outcome?

► Company/group using the model: Do customers trust me?

Can I back the model up?

4

Peeking Inside the Black-Box:

A Survey on Explainable Artificial Intelligence (XAI)

Adadi, 2018

Page 5: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

To put explanations in context

Model agnostic Model specific

Local explanation LIME,

Shapley values,

Explanation Vectors,

Counterfactuals

explanations,

Saliency map

Global explanation Partial dependence plots,

Activation maximization,

Model distillation,

Decision trees,

Rule lists,

5

Explain a

specific

prediction

Understanding

the whole logic

of the model

Used for

any ML

model

Specific to a

model like

xgboost or

regression

Shapley values

Page 6: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Shapley values

66

► Economic game theory in 1953.

► Setting: A game with 𝑀 players cooperating to maximize the total

gains of the game.

► Goal: Distribute the total gains in a “fair” way:

► Axiom 1: Players that contribute nothing get payout = 0.

► Axiom 2: Two players that contribute the same get equal payout.

► Axiom 3: The sum of the payouts = total gains.

► Lloyd Shapley (1953) found a unique way to distribute the total

gains in such a way that obeys these three axioms.

Page 7: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Shapley values

► We assume player 𝑗 will contribute possibly differently

depending on who he or she is cooperating with.

► Suppose player 𝑗 is cooperating with players in group 𝑆.

► Then, we define the marginal contribution of player 𝑗 with

group 𝑆:

7

𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆

Contribution

with player 𝑗

Contribution

without player 𝑗

Contribution

function

Page 8: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

8

𝜙𝑗=

𝑆⊆𝑀\ 𝑗

𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆

Shapley values

The set of all

players in

the game

► Shapley value of player 𝑗 = payout for player 𝑗 defined as

Weight

function

3 4

Page 9: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

9

Example► 2 players: 𝑥1 and 𝑥2

► Then, 𝑥1’s Shapley value is:

𝜙𝑥1 =

𝑆⊆𝑀\ 𝑥1

𝑤 𝑆 𝑣 𝑆 ∪ {𝑥1} − 𝑣 𝑆

𝜙𝑗=

𝑆⊆𝑀\ 𝑗

𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆

𝑃(𝑀\{𝑥1}) = {{𝑥2}, {∅}}

= 𝑤1[𝑣 𝑥2, 𝑥1 − 𝑣( 𝑥2 )] + 𝑤2 𝑣 ∅, 𝑥1 − 𝑣 ∅

power set

Page 10: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

How does this translate to ML?

10

► Given individual = Janet.

► Feature values are:▪ Age = 55 years

▪ Gender = woman

▪ Type of car = Buick (Volvo)

▪ # Accidents = 3

▪ Time since car registered = 3.2 years

► Predicted value = $123.

become

become

► Cooperative game individual.

► Players feature values of the given individual.

► Total gains predicted value of the given individual.

become

Page 11: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

How does this translate to ML?

11

𝜙𝑗=

𝑆⊆𝑀\ 𝑗

𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆

Contribution of

feature 𝑗

Set of all

features in the

ML model

Subset of

featuresContribution of

feature set 𝑆

We have access to almost all we need. We’re only missing 𝑣(𝑆)…

ML model

Set of all features

Here we condition on the features in 𝑆equal to the feature values of the

individual (Janet)

𝑆 = {Age, gender}ҧ𝑆 = {Type of car, # Accidents, Time since registration,

gender}

Page 12: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Problems with Shapley values in ML

12

𝜙𝑗=

𝑆⊆𝑀\ 𝑗

𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆

Calculating 𝜙𝑗 when

𝑀 is large.

Estimating 𝑣 𝑆 when

1 2

• If 𝑀 = 10, there are 210 = 1024 combinations.

• If 𝑀 = 30, there are 230 = 1.1 million!

Page 13: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

2) Computing 𝑣(𝑆)

► is rarely known.

▪ (Lundberg & Lee, 2017a) assume the features are

independent.▪ 𝑣(𝑆) can then be estimated by sampling from the full data set and calculating an

average.

▪ (Aas et al., 2019) estimate 𝑣(𝑆) parametrically and non-

parametrically in various methods.

13

2

Page 14: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

2) Computing 𝑣(𝑆)

► (Aas et al., 2019)’s methods to estimate

1. Empirical method:

1. Calculate a distance between the set of

features explained and every training instance

= 𝐷𝑆(𝑥∗, 𝑥𝑖).

2. Use this 𝐷 to calculate a weight

𝑤𝑆 𝑥∗, 𝑥𝑖 = exp(−𝐷2

2𝜎2) for each training instance.

3. Estimate

14

𝑣 𝑆 ≈σ𝑤𝑆 𝑓(𝑥 ҧ𝑆

𝑘 , 𝑥𝑆∗)

σ𝑤𝑆 (𝑥∗, 𝑥𝑘 )

ML model

Either samples

or all training

instances

2

Page 15: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

2) Computing 𝑣(𝑆)

► (Aas et al., 2019)’s methods to estimate

2. Gaussian method:

1. Assume the features are jointly Gaussian. This means we

have an explicit form for the conditional distribution

𝑝 𝑥 ҧ𝑆 𝑥𝑆 = 𝑥𝑆∗ .

2. Estimate the conditional mean and covariance matrix.

3. Sample 𝑘 times from this conditional Gaussian distribution

with the estimated mean and covariance matrix.

4. Estimate

15ML model

Samples from

the conditional

Gaussian

2

Page 16: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

2) Computing 𝑣(𝑆)

► The problem is that (Aas et al., 2019)’s methods assume

the features are continuously distributed.

16

We extend (Aas et al., 2019)’s method to handle

mixed (i.e continuous, categorical, ordinal) features

using conditional inference trees.

2

Page 17: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

17

► ctree (Hothorn et al., 2006) is a tree fitting statistical

model like CART and C4.5.

▪ What is so great about a tree?

► Differences:

▪ Solves for the splitting feature and split

point using hypothesis tests.

Page 18: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

18

How do we build a ctree?

1. Test each feature with the partial hypothesis test:

𝐻0𝑗: 𝐹 𝒀 𝑋𝑗 = 𝐹(𝒀)

2. If the global 𝑝-value is < 𝛼 , choose the feature that is

the least dependent of Y.

3. Find a split point based on this feature (and your

favourite splitting algorithm).

Can handle multivariate responses!

Page 19: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

19

► To estimate :

1. We fit a ctree where the tree features are the features in 𝑆and the tree response are the features not in 𝑆.

ҧ𝑆 ~ 𝑓(𝑆)

response features

Example:

𝑆 = {Age, gender}ҧ𝑆 = {Type of car, # Accidents, Time since registration}

Page 20: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

2. Given the tree, we find the leaf node based on Janet’s

features values:

20

𝑆 = {Age=55,

gender=woman}

woman 55 Volvo

woman

woman

woman

woman 54

52

60

55

Tesla

BMW

BMW

Nissan

0 1

2 1

1 0

3 2

0 1

gender age car accidents Time since

registration

Other train

observations

in the leaf:

Page 21: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

21

3. We sample from the leaf and use these samples to

estimate:

ML model

woman 55 Volvo

woman

woman

woman

woman 54

52

60

55

Tesla

BMW

BMW

Nissan

0 1

2 1

1 0

3 2

0 1

gender age car accidents Time since

registration

Page 22: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Simulation studies

1. Simulate dependent categorical data to act as our features.

2. Define linear response model.

3. Convert categorical data to numerical data using one-hot

encoding so that we can use methods in (Aas et al., 2019).

22

Fixed coefficients

Indicator function Normal(0, 1)

Page 23: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Simulation studies

4. Calculate the true Shapley values using the true

expectation:

5. Evaluate the methods using mean absolute error (MAE):

23# featuresTrue Shapley value of

feature 𝑗

Estimated Shapley value

of feature 𝑗

# test observations

Page 24: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Simulation study 1

24

Dependence of features

Mean absolute error

(Lundberg & Lee, 2017a)

(Aas et al., 2019)

Page 25: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Computation time

25

Over all

dependence,

𝜌, and test

observations

Note: Gaussian

is slow because

- It has to call

“predict”

function

more than

empirical +

ctree

- Matrix

inversions

and

sampling

Page 26: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Simulation study 2

26

Dependence of features

Mean absolute error

(Lundberg & Lee, 2017a)

(Aas et al., 2019)

Page 27: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Computation time

27

Over all

dependence,

𝜌, and test

observations

Page 28: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model

Limitations

► The ctree/Gaussian/empirical methods cannot be used for

more than 23 (30?) features due to computational

problems (regardless of one-hot encoding…). What can we

do to improve this?

▪ GroupSHAP?

▪ New approach to sampling: (Grah and Thouvenot, 2020)1?

281A Projected Stochastic Gradient Algorithm for Estimating Shapley Value Applied in Attribute Importance


Recommended