Explainable AI - cs.odu.edujbrunelle/cs891/student... · Kritika Garg, kgarg001@odu.edu....

Post on 02-Jan-2020

3 views 0 download

transcript

Explainable AI CS891 - Introduction to Emerging Technologies

Old Dominion UniversityDepartment of Computer ScienceKritika Garg, kgarg001@odu.edu

Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations

Upol Ehsan Brent HarrisonGeorgia Institute of Technology University of Kentucky, Atlanta, GA, USA Lexington, KY, USA

Larry ChanGeorgia Institute of Technology, Atlanta, GA, USA

Mark O. RiedlGeorgia Institute of Technology, Atlanta, GA, USA

Basic Terminologies

I propose to consider the question, ‘Can machines think?’— Alan Turing, 1950

● Turing's paper "Computing Machinery and Intelligence" (1950)

● Turing Test

What is AI?

The concept of AI is based on the idea of building machines that can think and learn like humans.

What is ML ?● An approach to AI where a machine learns by itself to solve a given

problem.● Algorithms that learns to make predictions based on input data.

https://christophm.github.io/interpretable-ml-book/images/programing-ml.png

Training

Testing

Input Data

https://image.slidesharecdn.com/dataphillycoreychiversintromlv2-160219135616/95/intro-to-machine-learning-7-638.jpg?cb=1455890361https://www.google.com/about/main/machine-learning-qa/img/cat-dog-flow-horizontal.gif

Predictions

"Black box" in machine learning

https://christophm.github.io/interpretable-ml-book/images/programing-ml.png

https://www.sciencemag.org/news/2017/07/how-ai-detectives-are-cracking-open-black-box-deep-learning

Neural Network

https://miro.medium.com/max/500/1*ZhYNqU2y96_f3QkWq9oiWQ.jpeg

https://artificialintelligence-news.com/2018/02/21/experts-ai-clear-present-danger/

A.I. Is Everywhere

https://d3lkc3n5th01x7.cloudfront.net/wp-content/uploads/2019/08/01213414/AI_app_architecture.jpg

Sometimes AI system can behave differently than what was expected

https://www.nytimes.com/interactive/2018/03/20/us/self-driving-uber-pedestrian-killed.html

The Accountability( who is to blame? ) and Safety becomes a factor when there are more critical decisions involved. For example:

● Driving an autonomous car

● Making medical diagnostics

You need to know “what AI is thinking” not just “what the results its giving”.

https://www.darpa.mil/ddm_gallery/xai-figure1-inline-graphic.png

https://www.darpa.mil/ddm_gallery/xai-figure2-inline-graphic.png

Explainable AI(XAI)

Explainable A.I is hard?

Data

Neural Net

Neural Net Decision Making

https://www.sciencemag.org/news/2017/07/how-ai-detectives-are-cracking-open-black-box-deep-learning

https://www.sciencemag.org/news/2017/07/how-ai-detectives-are-cracking-open-black-box-deep-learning

Relationalization

● Form of explainable AI● Act as a proxy for real explanation on what AI

is really thinking?● It provides human-like explanation.

Other Forms of Explanation● Accurate account of each

step in decision making process

● Requires an expert to interpret the decisions

Rationalization

● Insights into the agents behaviour explained in a human like manner.

● Can be easily understood by a non-expert.

Benefits over other explainability techniques

● Can be easily understood by non-expert.● Establish a trust between machine and

human.● Aid in human-machine collaboration, as it

is faster in conveying explanation.● Useful for time-critical decision making.

https://hbr.org/2018/07/collaborative-intelligence-humans-and-ai-are-joining-forces

https://www.sciencemag.org/news/2017/07/how-ai-detectives-are-cracking-open-black-box-deep-learning

Related Works

ApproachUsed a Game playing environment to train the AI.

● Experimental domain: Frogger

Generated Realizations for AI behaviour in the Frogger Game environment

The Agent had to reach the top while avoiding the obstacles in the environment

Issues

● Sequential Decision making

● Dynamic environment

Hypothesis

1. Encoder-Decoder network can accurately generate realizations

Hypothesis

1. Encoder-Decoder network can accurately generate realizations

2. Realizations are more satisfactory than other form of explanations, for humans.

Methodology for Generation of Rationalization 1. Training Corpus2. Grammar Creation3. Training and testing data set4. Training and Testing the network5. Results6. Discussion

Training Corpus

● Videos of 12 humans playing frogger while explaining their actions out loud.

● Recorded state of frog, after every action.● Used online speech transcription service.● Self validated the data by assigning action to their

utterances.● 225 action-rationalizations.

Fig1: AI Rationalization

Training corpus consisted of state-action pairs annotated with natural language.

● State: Coordinates of agent on grid layout, (x,y).

● Action ○ Movements

○ Standing still

● Natural Language

Fig1: AI Rationalization

Grammar Creation

Action-rationalization annotations

Set of rules based on behaviour of frogger agent

Grammar

State-action representation: tripple

(s1, a, s2)

Initial State

State, after action is performed

Action

Ground Truth Realizations Grammar was used to construct Ground Truth Realizations.

Semi- Synthetic Language

Sate- Action Representation Internal to System

Test Set● For every triple, a realization is generated● These examples were grouped according to their associated grammar rule.

E1E4

E3 E1E2

E5 E7E6E8 E9

Rule1 Rule2 Rule3

Test Set

E5 E7 E6

E8 E9

20% from each cluster was taken as a testing data.

This was to ensure that there is an example associated with every grammar rule in the testing dataset.

Training Set

E5 E7 E6

E8 E9

Rest 80% from each cluster was taken as a training data.

To help in training, they made duplicates of these 80% examples until they became 1000 in quantity.

Training the network

Approach1. Used Neural network to translate the two languages:

a. Code of the Gameb. Natural language

2. Then imported it back to the game playing network.

Encoder-Decoder Network

Fig1: AI Rationalization

● 2-layered LSTM ● Node size of 300● Trained the network

for 50 epochs

Evaluate The Accuracy● Requires to associate each generated sentence with a grammar rule.● To determine which rules associates the most with the output sentence:

○ Calculated sentence similarity between output sentence and the sentences generated by the grammar

○ The sentences with the highest similarity were recorded.○ Scores below 0.7 BLEU scores were not considered.○ The Rules of output and recorded sentences were compared, to check

whether they match or not.

Accuracy was calculated based on percentage of output sentences that matched with their associated test example.

Three different environments Implemented the generated rationalizations on three different maps, to evaluate their approach.

25%

25% 50%

25% 50% 75%

Comparing Results in different environment

Baseline Models

1. Random Model: Selected any random sentence as rationalization.

2. Majority-Vote Model: Selected sentences that are associated with the most common Rule in a particular map.

Comparing models using chi-square test

Figure 4. The Frogger agent rationalizing its actions.

Methodology for rationalization satisfaction1. Participants2. Procedure3. Results4. Analysis5. Findings and Discussion

Participants● Fifty-three adults (age range = 22 – 64 years, M = 34.1, SD = 9.38) were

recruited from Amazon Mechanical Turk (AMT).● Women: 21%● Participants were from three countries. ● 91% of people reported that they live in the United States.

Procedure● Participants were introduced to a hypothetical high-stakes scenario.● Scenarios were just “re-skinned” version of Frogger.● To avoid effects of preconceived notions, bots were named to A, B and C.

“Robot A” for the rationalizing robot, “Robot B” for the action-declaring robot, and “Robot C” for the numerical robot.

● Participants were randomly assigned to watch 6 videos: three depicting the agents succeeding and three showing them failing.

● Participants ranked their satisfaction and justified their choices.

Thematic Analysis● Analyzed the justifications that participants provided for their rankings.● Developed a set of codes that represented the different justifications.● Clustered these codes under emergent themes● Distilled the most relevant themes into insights that can be used to

understand the “whys” behind satisfaction of explanations.

Finding

The rationalizing agent’s explanations were rated higher than were those of the other two agents.

Attributes

Four attributes emerged from our thematic analysis that can be used to distinguish the rationalizing robot from the action declaring robot.

1. Explanatory power2. Relatability3. Ludic nature4. Adequate detail

Explanatory Power

Ability to explain its actions.

“. . . what it’s doing and why” (P6) enabled them to “. . . get into [the rationalizing robot’s] mind” (P17)

“[The action-declaring robot] explained almost nothing. . .which was disappointing.” (P38)

Relatability

Personality expressed through Robot’s explanation.

“ [The rationalizing robot] was relatable. He felt likea friend rather than a robot. I had a connection with[it] that would not be possible with the other 2 robotsbecause of his built-in personality. (P21) ”

Lucid Quality

Perceived Playfulness.

“[Therationalizing robot] was fun and entertaining. I couldn’t wait to seewhat he would say next!” (P2).

Adequate Detail

“[the rationalizing robot] talks too much” (P47)

“ the action-declaring robot is “nice and simple” (P48)

“would like to experience a combination of [the action-declaring robot] and [the rationalizing robot]” (P41).

Future Work

● To investigate the types of rationalizations.● To judge the extent of confidence people have on the rationalization robot.

Conclusions● Used Crowd-Source data to construct action-rationalization pairs.● Construct ground-truth realizations using Grammar.● Generated Rationalizations through neural machine translation techniques.● Evaluated the model over three different maps● The Model with 75% obstacle map performed with 80% accuracy.● Analysed that humans find Rationalization more satisfactory than other form

of explanations.