+ All Categories
Home > Documents > Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for...

Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for...

Date post: 21-May-2020
Category:
Upload: others
View: 18 times
Download: 0 times
Share this document with a friend
25
Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität München wwwmatthes.in.tum.de Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018, Master Thesis Final Presentation Advisor: Msc. Ahmed Elnaggar Supervisor: Prof. Dr. Florian Matthes
Transcript
Page 1: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

Chair of Software Engineering for Business Information Systems (sebis) Faculty of InformaticsTechnische Universität Münchenwwwmatthes.in.tum.de

Using Multitask Deep Learning for Question AnsweringA use case on the InsuranceQA datasetIman Jundi, 09.11.2018, Master Thesis Final PresentationAdvisor: Msc. Ahmed Elnaggar

Supervisor: Prof. Dr. Florian Matthes

Page 2: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 2

● Introduction & Motivation

● Research Questions

● Approach

● Models & Results

● Analysis & Samples

● Conclusion

Outline

Page 3: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 3

Introduction & Motivation

complex

ultimatelypass Turing Test

useful

many applicationsin many domains

Advances in recent yearse.g. IBM’s Watson winning Jeopardy

against all-time champions

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Using Multitask Deep Learning for Question Answering

SystemQuestion Answer

Text Retrieval

Answer Selection

Reading Comprehension

Natural Language Generation

Involves varying tasksin IR and NLP

Page 4: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 4

Text Retrieval

Introduction & Motivation

many layers → deep

train with huge #of data

learn complex representations

hard for some domain or tasks e.g. Insurance

task

related task

Benefit from related tasks with more data

No feature engineering

varyingarchitecture

architecture engineering per task

Share layers between tasks

better/generic representation

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Using Multitask Deep Learning for Question Answering

SystemQuestion Answer

Answer Selection

Reading Comprehension

Natural Language Generation

promote generic architectures

Go-to tool in recent years

Page 5: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 5

Use Case: Answer Selection - InsuranceQA Dataset [Feng15]

Question

instantresponse

lessresources

17,000 questions + 27,500 answers

http://www.insurancelibrary.com[Feng15] Feng, M., Xiang, B., Glass, M.R., Wang, L. and Zhou, B., 2015, December. Applying deep learning to answer selection: A study and an open task.

In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 813-820). IEEE.

Expert Answers

System

QuestionPool of

Answers

Answer

Similar Question

Select the correct answer of a question from answer pool

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

InsuranceQA Dataset:

refer

client company

Page 6: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 6

Related Task: Reading Comprehension - SQuAD Dataset [Rajpurkar16]

paragraph 100,000+ Q&A

Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P., 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2383-2392).

question

System

Paragraph Question

Answer Span

answer span

500+ Wikipedia articles

crowd workers

Selected from paragraphcrowd workers

vs. InsuranceQA 17,000 Questions

Answer comprehension questions on a paragraph (similar to humans)

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

● related task → multitask works● active research → benefit from● bigger dataset: ~6x questions● clean dataset: manually created

Stanford Question Answering Dataset (SQuAD):

SQuAD?

Page 7: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 7

Answer SelectionReading Comprehension Answer Selection Reading Comprehension

● Which Deep Neural Network model can be used for Multitask Learning of both Reading Comprehension and Answer Selection?

Research Questions

Model

QuestionPool of

Answers

Answer

Model

Paragraph Question

Answer Span

?Model

Paragraph

Answer Span

Question

Answer

Pool ofAnswersQuestion

??

model

performance with single-task trainingperformance with multitask training

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

● How well is the performance of this network for the Answer Selection task on theInsuranceQA dataset if it is trained only on this task?

● Would Answer Selection performance improve when the model is trained jointly on both this task and Reading Comprehension?

Page 8: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 8

Research Approach

Model

QuestionPool of

Answers

Answer

Model

Paragraph Question

Answer Span

?Model

Paragraph

Answer Span

Question

Answer

Pool ofAnswersQuestion

??

model

performance with single-task trainingperformance with multitask training

idea

implementation

evaluationPerformance Metrics:InsuranceQAAccuracy & Top-5 AccuracySQuADF1 & EM (Exact Match) on words

related work or observations from previous iterations online

implementationsof related workimprove

analyze

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 9: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 9

Answer SelectionReading Comprehension

Multitask Deep Learning Approach

Model

Paragraph

Answer Span

Question

Answer

Pool ofAnswersQuestion

RC layers AS layers

shared layers

multitask training:iterate between tasks per batch

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 10: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 10

Approach & Problem Definition of the Tasks

Paragraph Question

startprobability

endprobability

Question Answer

probability

Question Answer

similarity

encode

compare

train to make sim(Q,A+) > sim(Q,A-)

by a margin(hinge margin loss)

encode

binary (+/-)classification problem(cross entropy loss)

classification problemover the paragraph words

(cross entropy loss)

Answer Selection Reading Comprehension

Model

QuestionPool of

Answers

Answer

Model

Paragraph Question

Answer Span

Model

Paragraph Question

Answer Span

predict score for each answer predict word index for start,end

encode

Multitask Learningdoesn’t work

compatiblelearning objective

Word → Vector

vector representationof words in paragraph

based on question

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

common approachin related work

similarity functione.g. cosine

Page 11: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 11

Models

Also variations:- additional separate bi-GRU- CNN in output [Tan15]Multitask Learning bad on InsuranceQA

Tan, M., Santos, C.D., Xiang, B. and Zhou, B., 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M., 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics

Wang, S. and Jiang, J., 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747.Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.

more advanced models

Shared bi-RNN (AS similarity) Shared bi-RNN (AS probability) mtlRNet mltQANet

Compatible learning objective → Multitask learning starts working

adapt 2 SQuAD models RNet [Wang17] and QANet [Yu18]

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 12: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 12

Multitask Learning QANet [Yu18] - mtlQANet

InsuranceQA Accuracy

SQuAD F1

Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.

more relevant →more weight

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

word → vector(pretrained word embeddings)

word representation withinformation from context

representation of words in (answer/paragraph) based on the question

QANet [Yu18] RC on SQuADadapted for AS on InsuranceQA

task-specific output layer

classificationprobability output

further encoding

better with multitatask learning

Page 13: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 13

Final Results

Focus on optimizingsimilarity function

Shift the focus to a more generic solution

similarity implicit in attention

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

* mtlRNet and mtlQANet with Multitask Training reach 93%, 93.3% top-5 accuracy overall

Page 14: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 14

mtlQANet – Multitask Learning Ablation Study

InsuranceQA Accuracy

more sharing → better perfromance

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

few AS task-specific parameters (400)

Page 15: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 15

InsuranceQA Prediction Samples – correct top answer

focused attention only relevant words

SQuAD pinpoints answer?

single-task

multitask

Questionhow do I drop my Health Insurance

Ground Truth (predicted rank=1)how drop your health insurance in every case it involve write notice of your desire cancel your coverage on an individual basis you will write a note the insurance company customer service department and send it there for a business you will write a memo your hr cancel or wave coverage please note : if marry you may be ask include your spouse signature too..

a2q attention visualizationtop prediction = ground truth

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 16: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 16

multitask

InsuranceQA Prediction Samples – incorrect top answer

attention on “qualify”for criteria words

single-task

Questionhow do I qualify for government Health Insurance

Ground Truth: predicted rank=8 (single/multi-task)depending on where you live your age , income and other factor - you may qualify for Medicaid , MRMIB , Medicare , etc. the 2 big enchaladas of healthcare reform guarantee insurability and subsidy come into effect 1/1/14 ( open enrollment be 10/1/13 3/31/14 so...

top prediction (single/multi-task)when you say government health insuranc if you be refer to Healthcare Reform or Obamacare the rate depend on many factor such as age who be on the policy , zip code which plan you buy , etc. there will be subsidy as well so it depend if you be eligible for a subsidy in reality the actual rate..

more words in commonthan in ground truth

a2q attention visualizationtop prediction

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 17: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 17

multitask

InsuranceQA Prediction Samples – incorrect top answer

single-task

Questionhow do I qualify for government Health Insurance

Ground Truth: predicted rank=8 (single/multi-task)depending on where you live your age , income and other factor - you may qualify for Medicaid , MRMIB , Medicare , etc. the 2 big enchaladas of healthcare reform guarantee insurability and subsidy come into effect 1/1/14 ( open enrollment be 10/1/13 3/31/14 so...

top prediction (single/multi-task)when you say government health insuranc if you be refer to Healthcare Reform or Obamacare the rate depend on many factor such as age who be on the policy , zip code which plan you buy , etc. there will be subsidy as well so it depend if you be eligible for a subsidy in reality the actual rate..

a2q attention visualizationground truth

same effect for ground truth

not perfect

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 18: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 18

multitask

InsuranceQA Prediction Samples – incorrect top answer (only single-task)

single-task

Questioncan you buy auto insurance out of state

Ground Truth: predicted rank=2 (single-task)predicted rank=1 (multitask)you must purchase your auto insurance policy in the state in which the vehicle be register due to the regulatory framework of the insurance industry every state have regulatory agency that approve rate and coverage for company operate within that state if you own multiple vehicle in multiple state you will need buy separate policy within each state..

top prediction (single-task)liability coverage on auto be typically require in all state limit be usually in the $20,000 per person $40,000 per occurrence range with additional property damage benefit of course some state have low requirement and other have high requirement the cost can vary depending on your age , your driving record , your zip code and the type of vehicle you drive monthly premium can range from $35..

a2q attention visualizationground truth

focused attention as before

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 19: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 19

Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi’s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the ”golden anniversary” with variou gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as ”Super Bowl L”), so that the logo could prominently feature the Arabic numerals 50.

● What year did the Denver Broncos secure a Super Bowl title for the third time?ground truth: 2015 , 2016 , 2015prediction: 2016em=1.00 f1=1.00

● What does AFC stand for?ground truth: American Football Conference , American Football Conference , American Football Conferenceprediction: American Football Conferenceem=1.00 f1=1.00

● Which NFL team represented the NFC at Super Bowl 50?ground truth: Carolina Panthers , Carolina Panthers , Carolina Panthersprediction: Denver Broncosem=0.00 f1=0.00

● What color was used to emphasize the 50th anniversary of the Super Bowl?ground truth: gold , gold , goldprediction: goldenem=0.00 f1=0.00

SQuAD Prediction Samples

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

Page 20: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 20

Answer SelectionReading Comprehension Answer Selection Reading Comprehension

Conclusion

Model

Question Answers

Answer

Model

Paragraph Question

Answer Span

?Model

Paragraph

Answer Span

Question

Answer

AnswersQuestion

??

model

performance with single-task trainingperformance with multitask training

adapt 2 SQuAD modelswith minimal adjustment improves

● Which Deep Neural Network model can be used for Multitask Learning of both Reading Comprehension and Answer Selection?

● How well is the performance of this network for the Answer Selection task on theInsuranceQA dataset if it is trained only on this task?

● Would Answer Selection performance improve when the model is trained jointly on both this task and Reading Comprehension?

more generic models perform well

good

almostfully-shared

only task-specificoutput layer

compatiblelearning objective

competitiveresults

more genericno AS-specificoptimization

Introduction & Motivation | Research Questions | Approach | Models & Results | Analysis & Samples | Conclusion

SQuAD 2.0[Rajpurkar18]

utilize optimized similarity functions

but more generic e.g. add to combined representation

+questions without answerAS: A- ↔ RC: no answer?

other (more) tasks &multitask models

e.g. NLP Decathlon, MQAN [McCann18]

Rajpurkar, P., Jia, R. and Liang, P., 2018. Know What You Don't Know: Unanswerable Questions for SQuAD. arXiv preprint arXiv:1806.03822.McCann, B., Keskar, N.S., Xiong, C. and Socher, R., 2018. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730.

Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.

? ?

? pre-trained languagerepresentation model

e.g. BERT [Devlin18]Multitask Learning still useful?

?

Page 21: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 21

QA

Page 22: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 22

Shared bi-RNN Model

Similarity measure

Classification

probability of each word to be start, end of the answer

Model

Q A-

S(Q,A-)Training with Margin Loss→ more similarity with A+

A+

S(Q,A+)

SQuAD F1

InsuranceQA Accuracy

Variations:- additional separate bi-GRU- CNN in output [Tan15]Multitask learning still bad on InsuranceQA

Used as score

Not compatible with RC approach?

Tan, M., Santos, C.D., Xiang, B. and Zhou, B., 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108.

Backup: Models & Results

Page 23: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 23

Shared bi-RNN Model

InsuranceQA Accuracy

binary classification

Probability output trained with cross entropy loss

similar to SQuAD

More advance models

Backup: Models & Results

Page 24: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 24

Multitask Learning R-Net [Wang17] - mtlRNet

InsuranceQA Accuracy

SQuAD F1

Wang, W., Yang, N., Wei, F., Chang, B. and Zhou, M., 2017. Gated self-matching networks for reading comprehension and question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational LinguisticsWang, S. and Jiang, J., 2016. A compare-aggregate model for matching text sequences. arXiv preprint arXiv:1611.01747.

Answer to Question Attetion

Self -Attetion

Aggregate CNN [Wang16]

Backup: Models & Results

Page 25: Using Multitask Deep Learning for Question Answering A use ... · Using Multitask Deep Learning for Question Answering A use case on the InsuranceQA dataset Iman Jundi, 09.11.2018,

© sebis181109 Jundi Multitask Deep Learning for Question Answering 25

Multitask Learning QANet [Yu18] - mtlQANet

InsuranceQA Accuracy

SQuAD F1

Yu, A.W., Dohan, D., Le, Q., Luong, T., Zhao, R. and Chen, K., 2018. Fast and accurate reading comprehension by combining self-attention and convolution. In International Conference on Learning Representations.

Encoder (CNN + Att) [Yu18]

Move away from Recurrent NetsReplace RNN with CNN + Attetnion

Backup: Models & Results


Recommended