+ All Categories
Home > Documents > Interpretable Machine Learning for Human Decision Making · 2017-04-10 · Interpretable Machine...

Interpretable Machine Learning for Human Decision Making · 2017-04-10 · Interpretable Machine...

Date post: 09-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
1
Contact: Himabindu Lakkaraju Email: [email protected] Interpretable decision sets vs. Bayesian Decision Lists (Letham et. al.) Each user is randomly assigned one of the two models 10 objective and 2 descriptive questions Given a patient with the following attributes, Respiratory- Illness =Yes and Smoker =Yes, can you be absolutely sure that this patient suffers from Lung Cancer? What kinds of mistakes are evaluators making? Can we identify patterns of mistakes in the aggregate decision making process? How do these patterns change over time? Interpretable Machine Learning for Human Decision Making Himabindu Lakkaraju Stanford University Methodology Interpretable Decision Sets References Decision Sets Criteria for Interpretability Distinctness: Minimal overlap of rules w.r.t the datapoints they cover Parsimony: Fewer rules with fewer predicates Class Coverage: Explain as many classes as possible Solution Non-negative, non-normal, non- monotone, submodular objective Smooth Local Search [Feige et. al.] provides a 2/5 approximation Ongoing Research Task Metrics Our Approach Bayesian Decision Lists Descriptive Human Accuracy 0.81 0.17 Avg. Time Spent (secs.) 113.4 396.86 Avg. # of Words 31.11 120.57 Objective Human Accuracy 0.97 0.82 Avg. Time Spent (secs.) 28.18 36.34 Cost-Effective Treatment Regimes Modeling Evaluator Confusions r i,j z i c j a (j ) b (i) d i |J| |J| x |I| Prototypes & Feature Indicators of Clusters Item Cluster Decision of evaluator j on item i Attributes True Label Set of Confusion Matrices |I| Evaluator Cluster L2 L1 Qualitative Insights Text Labeling Task Evaluators are often confused between atheism and Christianity when documents are short. Female evaluators with low self- reported confidence scores are highly accurate!! Methodology User Study Learning unsupervised feature representations for decision making Can we attribute interpretability to these representations? Designing algorithmic frameworks which can intelligently incorporate human feedback in debugging machine learning models How do we discover unknown unknowns of complex models? Human judgements vs. Machine Predictions A case study on bail decisions How can machine learning algorithms help in critical decisions such as bail? H. Lakkaraju, J. Leskovec. Confusions over Time: An Interpretable Framework for Characterizing Trends in Decisions Making. NIPS, 2016. H. Lakkaraju, S. H. Bach, J. Leskovec. Interpretable Decision Sets: A Joint Framework for Prediction and Explanation. KDD, 2016. H. Lakkaraju, C. Rudin. Learning Cost-Effective Treatment Regimes. Manuscript, 2016 H. Lakkaraju, J. Leskovec, J. Kleinberg, S. Mullainathan. A Bayesian Framework for Modeling Human Evaluation. SIAM International Conference on Data Mining, 2015. H. Lakkaraju, J. Kleinberg, J. Leskovec, J. Ludwig, S. Mullainathan. Human Judgements vs. Machine Predictions, Manuscript, 2016. H. Lakkaraju, E. Kamar, R. Caruana, E. Horvitz. Identifying Unknown Unknowns in the Open World: Policies and Representations for Guided Exploration, Manuscript, 2016. Methodology Experimental Results Can we learn cost-effective and interpretable treatment regimes from observational data? Age, Gender, Health Records, Family History etc. Symptoms, Test Results Outcome Treatment Criteria for Cost-Effective Treatment Regimes Maximal outcomes Minimal assessment costs Minimal treatment costs Solution Objective function NP-Hard Formulate as a Markov Decision Process UCT algorithm with customized search space pruning Quantitative Analysis Experiments with bail decisions and asthma treatment recommendations Outcomes better than human experts in 29% of the cases Outcomes match state-of-the-art algorithms with 34% lesser assessment costs and 14% lesser treatment costs
Transcript
Page 1: Interpretable Machine Learning for Human Decision Making · 2017-04-10 · Interpretable Machine Learning for Human Decision Making Himabindu Lakkaraju Stanford University Modeling

Contact: Himabindu Lakkaraju

Email: [email protected]

•  Interpretable decision sets vs. Bayesian Decision Lists (Letham et. al.)

•  Each user is randomly assigned one of the two models

•  10 objective and 2 descriptive questions Given a patient with the following attributes, Respiratory-Illness =Yes and Smoker =Yes, can you be absolutely sure that this patient suffers from Lung Cancer?

•  What kinds of mistakes are evaluators making?

•  Can we identify patterns of mistakes in the aggregate decision making process?

•  How do these patterns change over time?

Interpretable Machine Learning for Human Decision Making Himabindu Lakkaraju Stanford University

Modeling Human Evaluations

Methodology

Interpretable Decision Sets

References

Decision Sets

Criteria for Interpretability •  Distinctness: Minimal overlap of rules

w.r.t the datapoints they cover •  Parsimony: Fewer rules with fewer

predicates •  Class Coverage: Explain as many classes

as possible Solution •  Non-negative, non-normal, non-

monotone, submodular objective •  Smooth Local Search [Feige et. al.]

provides a 2/5 approximation

Ongoing Research

Task Metrics Our Approach

Bayesian Decision Lists

Descriptive Human Accuracy

0.81 0.17

Avg. Time Spent (secs.)

113.4 396.86

Avg. # of Words 31.11 120.57

Objective Human Accuracy

0.97 0.82

Avg. Time Spent (secs.)

28.18 36.34

Cost-Effective Treatment Regimes Modeling Evaluator Confusions

ri,j

zi

cj

a(j) b(i)

di

|J|

|J| x |I|

Prototypes & Feature Indicators of Clusters

Item Cluster

Decision of evaluator j on item i

Attributes

True Label

Set of Confusion Matrices

|I| Evaluator Cluster

L2 L1

Qualitative Insights

Text Labeling Task

•  Evaluators are often confused between

atheism and Christianity when documents are short.

•  Female evaluators with low self-reported confidence scores are highly accurate!!

Methodology

User Study

•  Learning unsupervised feature representations for decision making

•  Can we attribute interpretability to these representations?

•  Designing algorithmic frameworks which can intelligently incorporate human feedback in debugging machine learning models

•  How do we discover unknown unknowns of complex models?

•  Human judgements vs. Machine Predictions

•  A case study on bail decisions

•  How can machine learning algorithms help in critical decisions such as bail?

H. Lakkaraju, J. Leskovec. Confusions over Time: An Interpretable Framework for Characterizing Trends in Decisions Making. NIPS, 2016.

H. Lakkaraju, S. H. Bach, J. Leskovec. Interpretable Decision Sets: A Joint Framework for Prediction and Explanation. KDD, 2016.

H. Lakkaraju, C. Rudin. Learning Cost-Effective Treatment Regimes. Manuscript, 2016

H. Lakkaraju, J. Leskovec, J. Kleinberg, S. Mullainathan. A Bayesian Framework for Modeling Human Evaluation. SIAM International Conference on Data Mining, 2015.

H. Lakkaraju, J. Kleinberg, J. Leskovec, J. Ludwig, S. Mullainathan. Human Judgements vs. Machine Predictions, Manuscript, 2016.

H. Lakkaraju, E. Kamar, R. Caruana, E. Horvitz. Identifying Unknown Unknowns in the Open World: Policies and Representations for Guided Exploration, Manuscript, 2016.

Methodology

Experimental Results

Can we learn cost-effective and interpretable treatment regimes from observational data?

Age, Gender, Health Records, Family History etc.

Symptoms, Test Results

Outcome

Treatment

Criteria for Cost-Effective Treatment Regimes •  Maximal outcomes •  Minimal assessment costs •  Minimal treatment costs Solution •  Objective function NP-Hard •  Formulate as a Markov Decision Process •  UCT algorithm with customized search

space pruning

Quantitative Analysis •  Experiments with bail decisions and

asthma treatment recommendations •  Outcomes better than human experts in

29% of the cases •  Outcomes match state-of-the-art

algorithms with 34% lesser assessment costs and 14% lesser treatment costs

Recommended