+ All Categories
Home > Documents > Deep Learning for Question Answering - Computer …jbg/teaching/CSCI_5832/1119.pdf(Feature...

Deep Learning for Question Answering - Computer …jbg/teaching/CSCI_5832/1119.pdf(Feature...

Date post: 01-May-2018
Category:
Upload: truongdung
View: 244 times
Download: 4 times
Share this document with a friend
134
Deep Learning for Question Answering Jordan Boyd-Graber University of Colorado Boulder 19. NOVEMBER 2014 Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 1 of 49
Transcript

Deep Learning for QuestionAnswering

Jordan Boyd-GraberUniversity of Colorado Boulder19. NOVEMBER 2014

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 1 of 49

Why Deep Learning

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 2 of 49

Why Deep Learning

Deep Learning was once known as “Neural Networks”

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 3 of 49

Why Deep Learning

But it came back . . .

Political Ideology Detection Using Recursive Neural Networks

Mohit Iyyer1, Peter Enns2, Jordan Boyd-Graber3,4, Philip Resnik2,4

1Computer Science, 2Linguistics, 3iSchool, and 4UMIACSUniversity of Maryland

{miyyer,peter,jbg}@umiacs.umd.edu, [email protected]

Abstract

An individual’s words often reveal their po-litical ideology. Existing automated tech-niques to identify ideology from text focuson bags of words or wordlists, ignoring syn-tax. Taking inspiration from recent work insentiment analysis that successfully modelsthe compositional aspect of language, weapply a recursive neural network (RNN)framework to the task of identifying the po-litical position evinced by a sentence. Toshow the importance of modeling subsen-tential elements, we crowdsource politicalannotations at a phrase and sentence level.Our model outperforms existing models onour newly annotated dataset and an existingdataset.

1 Introduction

Many of the issues discussed by politicians andthe media are so nuanced that even word choiceentails choosing an ideological position. For ex-ample, what liberals call the “estate tax” conser-vatives call the “death tax”; there are no ideolog-ically neutral alternatives (Lakoff, 2002). Whileobjectivity remains an important principle of jour-nalistic professionalism, scholars and watchdoggroups claim that the media are biased (Grosecloseand Milyo, 2005; Gentzkow and Shapiro, 2010;Niven, 2003), backing up their assertions by pub-lishing examples of obviously biased articles ontheir websites. Whether or not it reflects an under-lying lack of objectivity, quantitative changes in thepopular framing of an issue over time—favoringone ideologically-based position over another—canhave a substantial effect on the evolution of policy(Dardis et al., 2008).

Manually identifying ideological bias in polit-ical text, especially in the age of big data, is animpractical and expensive process. Moreover, bias

They dubbed it

the

death tax“ ” and created a big lie about

its adverse effectson small

businesses

Figure 1: An example of compositionality in ideo-logical bias detection (red ! conservative, blue !liberal, gray ! neutral) in which modifier phrasesand punctuation cause polarity switches at higherlevels of the parse tree.

may be localized to a small portion of a document,undetectable by coarse-grained methods. In this pa-per, we examine the problem of detecting ideologi-cal bias on the sentence level. We say a sentencecontains ideological bias if its author’s politicalposition (here liberal or conservative, in the senseof U.S. politics) is evident from the text.

Ideological bias is difficult to detect, even forhumans—the task relies not only on politicalknowledge but also on the annotator’s ability topick up on subtle elements of language use. Forexample, the sentence in Figure 1 includes phrasestypically associated with conservatives, such as“small businesses” and “death tax”. When we takemore of the structure into account, however, wefind that scare quotes and a negative propositionalattitude (a lie about X) yield an evident liberal bias.

Existing approaches toward bias detection havenot gone far beyond “bag of words” classifiers, thusignoring richer linguistic context of this kind andoften operating at the level of whole documents.In contrast, recent work in sentiment analysis hasused deep learning to discover compositional ef-fects (Socher et al., 2011b; Socher et al., 2013b).

Building from those insights, we introduce a re-cursive neural network (RNN) to detect ideologicalbias on the sentence level. This model requires

• More data

• Better tricks(regularization)

• Faster computers

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 4 of 49

Why Deep Learning

And companies are investing . . .

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 5 of 49

Why Deep Learning

And companies are investing . . .

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 5 of 49

Why Deep Learning

And companies are investing . . .

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 5 of 49

Review of Logistic Regression

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 6 of 49

Review of Logistic Regression

Map inputs to output

Input

Vector x1 . . . xd

Output

f

X

i

Wixi + b

!Activation

f (z) ⌘ 1

1 + exp(�z)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 7 of 49

Review of Logistic Regression

Map inputs to output

Input

Vector x1 . . . xd

inputs encoded asreal numbers

Output

f

X

i

Wixi + b

!Activation

f (z) ⌘ 1

1 + exp(�z)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 7 of 49

Review of Logistic Regression

Map inputs to output

Input

Vector x1 . . . xd

Output

f

X

i

Wixi + b

!

multiply inputs byweights

Activation

f (z) ⌘ 1

1 + exp(�z)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 7 of 49

Review of Logistic Regression

Map inputs to output

Input

Vector x1 . . . xd

Output

f

X

i

Wixi + b

!

add bias

Activation

f (z) ⌘ 1

1 + exp(�z)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 7 of 49

Review of Logistic Regression

Map inputs to output

Input

Vector x1 . . . xd

Output

f

X

i

Wixi + b

!

Activation

f (z) ⌘ 1

1 + exp(�z)

pass throughnonlinear sigmoid

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 7 of 49

Review of Logistic Regression

What’s a sigmoid?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 8 of 49

Review of Logistic Regression

In the shallow end

• This is still logistic regression

• Engineering features x is di�cult (and requires expertise)

• Can we learn how to represent inputs into final decision?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 9 of 49

Can’t Somebody else do it? (Feature Engineering)

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 10 of 49

Can’t Somebody else do it? (Feature Engineering)

Learn the features and the function

a

(2)1 = f

⇣W

(1)11 x1 +W

(1)12 x2 +W

(1)13 x3 + b

(1)1

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 11 of 49

Can’t Somebody else do it? (Feature Engineering)

Learn the features and the function

a

(2)2 = f

⇣W

(1)21 x1 +W

(1)22 x2 +W

(1)23 x3 + b

(1)2

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 11 of 49

Can’t Somebody else do it? (Feature Engineering)

Learn the features and the function

a

(2)3 = f

⇣W

(1)31 x1 +W

(1)32 x2 +W

(1)33 x3 + b

(1)3

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 11 of 49

Can’t Somebody else do it? (Feature Engineering)

Learn the features and the function

hW ,b(x) = a

(3)1 = f

⇣W

(2)11 a

(2)1 +W

(2)12 a

(2)2 +W

(2)13 a

(2)3 + b

(2)1

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 11 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Sum over all layers

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Sum over all sources

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

• For every example x , y of our supervised training set, we want thelabel y to match the prediction hW ,b(x).

J(W , b; x , y) ⌘ 1

2||hW ,b(x) � y ||2 (1)

• We want this value, summed over all of the examples to be assmall as possible

• We also want the weights not to be too large

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2(2)

Sum over all destinations

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 12 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

Putting it all together:

J(W , b) =

"1

m

mX

i=1

1

2||hW ,b(x

(i)) � y

(i)||2#+

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2

(3)

• Our goal is to minimize J(W , b) as a function of W and b

• Initialize W and b to small random value near zero

• Adjust parameters to optimize J

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 13 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

Putting it all together:

J(W , b) =

"1

m

mX

i=1

1

2||hW ,b(x

(i)) � y

(i)||2#+

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2

(3)

• Our goal is to minimize J(W , b) as a function of W and b

• Initialize W and b to small random value near zero

• Adjust parameters to optimize J

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 13 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

Putting it all together:

J(W , b) =

"1

m

mX

i=1

1

2||hW ,b(x

(i)) � y

(i)||2#+

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2

(3)

• Our goal is to minimize J(W , b) as a function of W and b

• Initialize W and b to small random value near zero

• Adjust parameters to optimize J

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 13 of 49

Can’t Somebody else do it? (Feature Engineering)

Objective Function

Putting it all together:

J(W , b) =

"1

m

mX

i=1

1

2||hW ,b(x

(i)) � y

(i)||2#+

2

nl�1X

l

slX

i=1

sl+1X

j=1

⇣W

lji

⌘2

(3)

• Our goal is to minimize J(W , b) as a function of W and b

• Initialize W and b to small random value near zero

• Adjust parameters to optimize J

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 13 of 49

Deep Learning from Data

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 14 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

0

Parameter

Objective

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

0

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

0

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

10

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

10

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

10

2

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

10

2

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

10

23

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

10

23

UndiscoveredCountry

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Gradient Descent

Goal

Optimize J with respect to variables W and b

0

Wij(l) Wij

(l)

Jα∂

∂W

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 15 of 49

Deep Learning from Data

Backpropigation

• For convenience, write the input to sigmoid

z

(l)i =

nX

j=1

W

(l�1)ij xj + b

(l�1)i (4)

• The gradient is a function of a node’s error �(l)i• For output nodes, the error is obvious:

�(nl )i =@

@z(nl )i

||y � hw ,b(x)||2 = �⇣yi � a

(nl )i

⌘· f 0⇣z

(nl )i

⌘ 1

2(5)

• Other nodes must “backpropagate” downstream error based onconnection strength

�(l)i =

0

@st+1X

j=1

W

(l+1)ji �(l+1)

j

1

Af

0(z(l)i ) (6)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 16 of 49

Deep Learning from Data

Backpropigation

• For convenience, write the input to sigmoid

z

(l)i =

nX

j=1

W

(l�1)ij xj + b

(l�1)i (4)

• The gradient is a function of a node’s error �(l)i

• For output nodes, the error is obvious:

�(nl )i =@

@z(nl )i

||y � hw ,b(x)||2 = �⇣yi � a

(nl )i

⌘· f 0⇣z

(nl )i

⌘ 1

2(5)

• Other nodes must “backpropagate” downstream error based onconnection strength

�(l)i =

0

@st+1X

j=1

W

(l+1)ji �(l+1)

j

1

Af

0(z(l)i ) (6)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 16 of 49

Deep Learning from Data

Backpropigation

• For convenience, write the input to sigmoid

z

(l)i =

nX

j=1

W

(l�1)ij xj + b

(l�1)i (4)

• The gradient is a function of a node’s error �(l)i• For output nodes, the error is obvious:

�(nl )i =@

@z(nl )i

||y � hw ,b(x)||2 = �⇣yi � a

(nl )i

⌘· f 0⇣z

(nl )i

⌘ 1

2(5)

• Other nodes must “backpropagate” downstream error based onconnection strength

�(l)i =

0

@st+1X

j=1

W

(l+1)ji �(l+1)

j

1

Af

0(z(l)i ) (6)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 16 of 49

Deep Learning from Data

Backpropigation

• For convenience, write the input to sigmoid

z

(l)i =

nX

j=1

W

(l�1)ij xj + b

(l�1)i (4)

• The gradient is a function of a node’s error �(l)i• For output nodes, the error is obvious:

�(nl )i =@

@z(nl )i

||y � hw ,b(x)||2 = �⇣yi � a

(nl )i

⌘· f 0⇣z

(nl )i

⌘ 1

2(5)

• Other nodes must “backpropagate” downstream error based onconnection strength

�(l)i =

0

@st+1X

j=1

W

(l+1)ji �(l+1)

j

1

Af

0(z(l)i ) (6)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 16 of 49

Deep Learning from Data

Backpropigation

• For convenience, write the input to sigmoid

z

(l)i =

nX

j=1

W

(l�1)ij xj + b

(l�1)i (4)

• The gradient is a function of a node’s error �(l)i• For output nodes, the error is obvious:

�(nl )i =@

@z(nl )i

||y � hw ,b(x)||2 = �⇣yi � a

(nl )i

⌘· f 0⇣z

(nl )i

⌘ 1

2(5)

• Other nodes must “backpropagate” downstream error based onconnection strength

�(l)i =

0

@st+1X

j=1

W

(l+1)ji �(l+1)

j

1

Af

0(z(l)i ) (6)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 16 of 49

Deep Learning from Data

Backpropigation

• For convenience, write the input to sigmoid

z

(l)i =

nX

j=1

W

(l�1)ij xj + b

(l�1)i (4)

• The gradient is a function of a node’s error �(l)i• For output nodes, the error is obvious:

�(nl )i =@

@z(nl )i

||y � hw ,b(x)||2 = �⇣yi � a

(nl )i

⌘· f 0⇣z

(nl )i

⌘ 1

2(5)

• Other nodes must “backpropagate” downstream error based onconnection strength

�(l)i =

0

@st+1X

j=1

W

(l+1)ji �(l+1)

j

1

Af

0(z(l)i ) (6)

(chain rule)Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 16 of 49

Deep Learning from Data

Partial Derivatives

• For weights, the partial derivatives are

@

@W (l)ij

J(W , b; x , y) = a

(l)j �(l+1)

i (7)

• For the bias terms, the partial derivatives are

@

@b(l)i

J(W , b; x , y) = �(l+1)i (8)

• But this is just for a single example . . .

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 17 of 49

Deep Learning from Data

Full Gradient Descent Algorithm

1. Initialize U

(l) and V

(l) as zero

2. For each example i1 . . .m2.1 Use backpropagation to compute rW J and rbJ

2.2 Update weight shifts U(l) = U

(l) + rW (l)J(W , b; x , y)2.3 Update bias shifts V (l) = V

(l) + rb(l)J(W , b; x , y)

3. Update the parameters

W

(l) =W

(l) � ↵

✓1

m

U

(l)

◆�(9)

b

(l) =b

(l) � ↵

1

m

V

(l)

�(10)

4. Repeat until weights stop changing

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 18 of 49

Tricks and Toolkits

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 19 of 49

Tricks and Toolkits

Tricks

• Stochastic gradient: compute gradient from a few examples

• Hardware: Do matrix computations on gpus

• Dropout: Randomly set some inputs to zero

• Initialization: Using an autoencoder can help representation

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 20 of 49

Toolkits for Deep Learning

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 21 of 49

Toolkits for Deep Learning

• Theano: Python package (Yoshua Bengio)

• Torch7: Lua package (Yann LeCunn)

• ConvNetJS: Javascript package (Andrej Karpathy)

• Both automatically compute gradients and have numericaloptimization

• Working group this summer at umd

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 22 of 49

Quiz Bowl

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 23 of 49

Quiz Bowl

Humans doing Incremental Classification

• Game called “quiz bowl”• Two teams play each other

� Moderator reads a question� When a team knows the

answer, they signal (“buzz” in)� If right, they get points;

otherwise, rest of the questionis read to the other team

• Hundreds of teams in the USalone

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 24 of 49

Quiz Bowl

Humans doing Incremental Classification

• Game called “quiz bowl”• Two teams play each other

� Moderator reads a question� When a team knows the

answer, they signal (“buzz” in)� If right, they get points;

otherwise, rest of the questionis read to the other team

• Hundreds of teams in the USalone

• Example . . .

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 24 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so Debyeadjusted the theory for low temperatures. His summation conventionautomatically sums repeated indices in tensor products. His name isattached to the A and B coe�cients

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so Debyeadjusted the theory for low temperatures. His summation conventionautomatically sums repeated indices in tensor products. His name isattached to the A and B coe�cients for spontaneous and stimulatedemission, the subject of one of his multiple groundbreaking 1905papers. He further developed the model of statistics sent to him by

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so Debyeadjusted the theory for low temperatures. His summation conventionautomatically sums repeated indices in tensor products. His name isattached to the A and B coe�cients for spontaneous and stimulatedemission, the subject of one of his multiple groundbreaking 1905papers. He further developed the model of statistics sent to him byBose to describe particles with integer spin. For 10 points, who is thisGerman physicist best known for formulating the

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so Debyeadjusted the theory for low temperatures. His summation conventionautomatically sums repeated indices in tensor products. His name isattached to the A and B coe�cients for spontaneous and stimulatedemission, the subject of one of his multiple groundbreaking 1905papers. He further developed the model of statistics sent to him byBose to describe particles with integer spin. For 10 points, who is thisGerman physicist best known for formulating the special and generaltheories of relativity?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so Debyeadjusted the theory for low temperatures. His summation conventionautomatically sums repeated indices in tensor products. His name isattached to the A and B coe�cients for spontaneous and stimulatedemission, the subject of one of his multiple groundbreaking 1905papers. He further developed the model of statistics sent to him byBose to describe particles with integer spin. For 10 points, who is thisGerman physicist best known for formulating the special and generaltheories of relativity?

Albert Einstein

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Sample Question 1

With Leo Szilard, he invented a doubly-eponymous refrigerator withno moving parts. He did not take interaction with neighbors intoaccount when formulating his theory of heat capacity, so Debyeadjusted the theory for low temperatures. His summation conventionautomatically sums repeated indices in tensor products. His name isattached to the A and B coe�cients for spontaneous and stimulatedemission, the subject of one of his multiple groundbreaking 1905papers. He further developed the model of statistics sent to him byBose to describe particles with integer spin. For 10 points, who is thisGerman physicist best known for formulating the special and generaltheories of relativity?

Albert Einstein

Faster = Smarter

1. Colorado School of Mines

2. Brigham Young University

3. California Institute of Technology

4. Harvey Mudd College

5. University of Colorado

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 25 of 49

Quiz Bowl

Humans doing Incremental Classification

• This is not Jeopardy (Watson)

• There are buzzers, but playerscan only buzz at the end of aquestion

• Doesn’t discriminate knowledge

• Quiz bowl questions arepyramidal

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 26 of 49

Quiz Bowl

Humans doing Incremental Classification

• Thousands of questions are written every year

• Large question databases

• Teams practice on these questions (some online, e.g. IRC)

• How can we learn from this?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 27 of 49

Quiz Bowl

System for Incremental Classifiers

• Treat this as a MDP• Action: buzz now or wait

1. Content Model is constantly generating guesses2. Oracle provides examples where it is correct3. The Policy generalizes to test data4. Features represent our state

content model oracle policy features

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 28 of 49

Quiz Bowl

System for Incremental Classifiers

• Treat this as a MDP• Action: buzz now or wait

1. Content Model is constantly generating guesses2. Oracle provides examples where it is correct3. The Policy generalizes to test data4. Features represent our state

content model oracle policy features

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 28 of 49

Quiz Bowl

System for Incremental Classifiers

• Treat this as a MDP• Action: buzz now or wait

1. Content Model is constantly generating guesses2. Oracle provides examples where it is correct3. The Policy generalizes to test data4. Features represent our state

content model oracle policy features

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 28 of 49

Quiz Bowl

Content Model

content model oracle policy features• Bayesian generative model with answers as latent state

• Unambiguous Wikipedia pages

• Unigram term weightings (naıve Bayes, BM25)

• Maintains posterior distribution over guesses• Always has a guess of what it should answer

� policy will tell us when to trust it

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 29 of 49

Quiz Bowl

Vector Space Model

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

Qatar

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

Qatar

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

QatarBahrain

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

QatarBahrainGummibears

Pokemon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

QatarBahrainGummibears

Pokemon?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Vector Space Model

arabian

persian

gulf

kingdom

expatriates

QatarBahrainGummibears

Pokemon?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 30 of 49

Quiz Bowl

Oracle

content model oracle policy featuresRevealed Text Content Model Answer Oracle

Lyndon Johnson

The National Endowment for the Arts, War o

Martha Graham

The National Endowment for the Arts, War on Poverty, and Medicare were establish by, for 10 points, what Texan

George Bush Lyndon Johnson

The National Endowment for the Arts, War on Poverty, and Medicare were establish by, for 10 points, what Texan who defeated Barry Goldwater, promoted the Great Society, and succeeded John F. Kennedy?

Lyndon Johnson

Lyndon Johnson

• As each token is revealed, look at content model’s guess• If it’s right, positive instance; otherwise negative• Nearly optimal policy to buzz whenever correct (upper bound)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 31 of 49

Quiz Bowl

Policy

content model oracle policy features• Mapping: state 7! action

• Use oracle as example actions

• Learned as classifier (Langford et al., 2005)• At test time, use the same features as for training

� Question text (so far)� Guess� Posterior distribution� Change in posterior

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 32 of 49

Quiz Bowl

Features (by example)

content model oracle policy features

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

content model oracle policy features

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

content model oracle policy features

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the

0.11 Mithridates0.09 Julius Caesar0.08 Alexander the Great0.07 Sulla

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the

0.11 Mithridates0.09 Julius Caesar0.08 Alexander the Great0.07 Sulla

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Text Guess Posterior Change

State Representation

idx: 21ftp: f

this_man: 01commentaries: 01

pontus: 01… …

gss: mithridates top_1: 0.11top_2: 0.09top_3: 0.08

new: truedelta: +0.09

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the

0.11 Mithridates0.09 Julius Caesar0.08 Alexander the Great0.07 Sulla

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Text Guess Posterior Change

State Representation

idx: 21ftp: f

this_man: 01commentaries: 01

pontus: 01… …

gss: mithridates top_1: 0.11top_2: 0.09top_3: 0.08

new: truedelta: +0.09

Wait

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the

0.11 Mithridates0.09 Julius Caesar0.08 Alexander the Great0.07 Sulla

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the Gallic Wars. FTP, name this Roman

0.89 Julius Caesar0.02 Augustus0.01 Sulla0.01 Pompey

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Text Guess Posterior Change

State Representation

idx: 21ftp: f

this_man: 01commentaries: 01

pontus: 01… …

gss: mithridates top_1: 0.11top_2: 0.09top_3: 0.08

new: truedelta: +0.09

Wait

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the

0.11 Mithridates0.09 Julius Caesar0.08 Alexander the Great0.07 Sulla

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the Gallic Wars. FTP, name this Roman

0.89 Julius Caesar0.02 Augustus0.01 Sulla0.01 Pompey

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Text Guess Posterior Change

State Representation

idx: 21ftp: f

this_man: 01commentaries: 01

pontus: 01… …

gss: mithridates top_1: 0.11top_2: 0.09top_3: 0.08

new: truedelta: +0.09

Wait

State Representation

idx: 55ftp: t

this_man: 01this_roman: 01

gallic: 01… …

gss: j_caesar top_1: 0.89top_2: 0.02top_3: 0.01

new: truedelta: +0.78

Text Guess Posterior Change

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Features (by example)

Observation: This man won the Battle

0.02 Tokugawa0.01 Erwin Rommel0.01 Joan of Arc0.01 Stephen Craneco

nten

t

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the

0.11 Mithridates0.09 Julius Caesar0.08 Alexander the Great0.07 Sulla

Observation: This man won the Battle of Zela over Pontus. He wrote about his victory at Alesia in his Commentaries on the Gallic Wars. FTP, name this Roman

0.89 Julius Caesar0.02 Augustus0.01 Sulla0.01 Pompey

State Representation

idx: 05ftp: f

this_man: 01won: 01

battle: 01

Text Guess

gss: tokugawa

Posterior

top_1: 0.02top_2: 0.01top_3: 0.01

Wait

Text Guess Posterior Change

State Representation

idx: 21ftp: f

this_man: 01commentaries: 01

pontus: 01… …

gss: mithridates top_1: 0.11top_2: 0.09top_3: 0.08

new: truedelta: +0.09

Wait

State Representation

idx: 55ftp: t

this_man: 01this_roman: 01

gallic: 01… …

gss: j_caesar top_1: 0.89top_2: 0.02top_3: 0.01

new: truedelta: +0.78

Text Guess Posterior Change

Buzz

Answer: Julius Caesar

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 33 of 49

Quiz Bowl

Simulating a Game

• Present tokens incrementally to algorithm, see where it buzzes

• Compare to where humans buzzed in

• Payo↵ matrix (wrt Computer)

Computer Human Payo↵1 first and wrong right �152 — first and correct �103 first and wrong wrong �54 first and correct — +105 wrong first and wrong +56 right first and wrong +15

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 34 of 49

Quiz Bowl

Interface

• Users could “veto” categories ortournaments

• Questions presented incanonical order

• Approximate string matching(w/ override)

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 35 of 49

Quiz Bowl

Interface

• Started on Amazon MechanicalTurk

• 7000 questions were answeredin the first day

• Over 43000 questions wereanswered in the space of twoweeks

• Total of 461 unique users

• Leaderboard to encourage users

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 35 of 49

Quiz Bowl

Error Analysis

maurice ravel

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

Posterior Opponent

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

Error Analysis

maurice ravel

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

pictures at an exhibitionmaurice ravel

Posterior Opponent

Predictionanswer

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

Error Analysis

maurice ravel

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

pictures at an exhibitionmaurice ravel

Posterior Opponent

Predictionanswer

Buzz

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

Error Analysis

maurice ravel

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

pictures

orchestrated

pictures at an exhibitionmaurice ravel

this_french

composer

bolero

Posterior Opponent

Predictionanswer

Observation

feature

Buzz

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

Error Analysis

• Too slow

• Coreference and correctquestion type

• Not enough information/ not weighting laterclues higher

maurice ravel

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

pictures

orchestrated

pictures at an exhibitionmaurice ravel

this_french

composer

bolero

Posterior Opponent

Predictionanswer

Observation

feature

Buzz

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

Error Analysis

• Too slow

• Coreference and correctquestion type

• Not enough information/ not weighting laterclues higher

enrico fermi

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

1.0

0 20 40 60 80 100

magnetic field neutrinomagnetic field

magnetic

paradox

zerothis_man

this_italian

Predictionanswer

Observation

feature

Buzz

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

Error Analysis

• Too slow

• Coreference and correctquestion type

• Not enough information/ not weighting laterclues higher

george washington

Tokens Revealed

0.0

0.2

0.4

0.6

0.8

0 10 20 30 40 50 60

charlemagne yasunari kawabata

generals

chill

mistress

language

Predictionanswer

Observation

feature

Buzz

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 36 of 49

Quiz Bowl

How can we do better?

• Use order of words in a sentence “this man shot Lee HarveyOswald” very di↵erent from “Lee Harvey Oswald shot this man”

• Use relationship between questions (“China” and “Taiwan”)

• Use learned features and dimensions, not the words we start with

• Recursive Neural Networks (Socher et al., 2012)

• First-time a learned representation has been applied to questionanswering

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 37 of 49

Quiz Bowl

How can we do better?

• Use order of words in a sentence “this man shot Lee HarveyOswald” very di↵erent from “Lee Harvey Oswald shot this man”

• Use relationship between questions (“China” and “Taiwan”)

• Use learned features and dimensions, not the words we start with

• Recursive Neural Networks (Socher et al., 2012)

• First-time a learned representation has been applied to questionanswering

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 37 of 49

Deep Quiz Bowl

Plan

Why Deep Learning

Review of Logistic Regression

Can’t Somebody else do it? (Feature Engineering)

Deep Learning from Data

Tricks and Toolkits

Toolkits for Deep Learning

Quiz Bowl

Deep Quiz Bowl

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 38 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

?

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

admiral

attacked

kotte

This city ’s economy depended on subjugated peasants called helots

ROOT

DET POSSESSIVE

POSSNSUBJ

PREP

POBJ

AMODVMOD DOBJ

Figure 2: Dependency parse of a sentence from a question about Sparta.

positionality over the standard rnn model bytaking into account relation identity along withtree structure. We include an additional d � dmatrix, Wv, to incorporate the word vector xw

at a node into the node vector hn.Given a parse tree (Figure 2), we first com-

pute leaf representations. For example, thehidden representation hhelots is

hhelots = f(Wv · xhelots + b), (1)

where f is a non-linear activation function suchas tanh and b is a bias term. Once all leavesare finished, we move to interior nodes withalready processed children. Continuing from“helots” to its parent, “called”, we compute

hcalled =f(WDOBJ · hhelots + Wv · xcalled

+ b). (2)

We repeat this process up to the root, which is

hdepended =f(WNSUBJ · heconomy + WPREP · hon

+ Wv · xdepended + b). (3)

The composition equation for any node n withchildren K(n) and word vector xw is hn =

f(Wv · xw + b +X

k�K(n)

WR(n,k) · hk), (4)

where R(n, k) is the dependency relation be-tween node n and child node k.

3.2 Training

Our goal is to map questions to their corre-sponding answer entities. Because there area limited number of possible answers, we canview this as a multi-class classification task.While a softmax layer over every node in thetree could predict answers (Socher et al., 2011;Iyyer et al., 2014), this method overlooks thatmost answers are themselves words (features)in other questions (e.g., a question on World

War II might mention the Battle of the Bulgeand vice versa). Thus, word vectors associatedwith such answers can be trained in the samevector space as question text,2 enabling us tomodel relationships between answers insteadof assuming incorrectly that all answers areindependent.

To take advantage of this observation, wedepart from Socher et al. (2014) by trainingboth the answers and questions jointly in asingle model, rather than training each sep-arately and holding embeddings fixed duringdt-rnn training. This method cannot be ap-plied to the multimodal text-to-image mappingproblem because text captions by definition aremade up of words and thus cannot include im-ages; in our case, however, question text canand frequently does include answer text.

Intuitively, we want to encourage the vectorsof question sentences to be near their correctanswers and far away from incorrect answers.We accomplish this goal by using a contrastivemax-margin objective function described be-low. While we are not interested in obtaining aranked list of answers,3 we observe better per-formance by adding the weighted approximate-rank pairwise (warp) loss proposed in Westonet al. (2011) to our objective function.

Given a sentence paired with its correct an-swer c, we randomly select j incorrect answersfrom the set of all incorrect answers and denotethis subset as Z. Since c is part of the vocab-ulary, it has a vector xc � We. An incorrectanswer z � Z is also associated with a vectorxz � We. We define S to be the set of all nodesin the sentence’s dependency tree, where anindividual node s � S is associated with the

2Of course, questions never contain their own answeras part of the text.

3In quiz bowl, all wrong guesses are equally detri-mental to a team’s score, no matter how “close” a guessis to the correct answer.

=

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

admiral

attacked

kotte

This city ’s economy depended on subjugated peasants called helots

ROOT

DET POSSESSIVE

POSSNSUBJ

PREP

POBJ

AMODVMOD DOBJ

Figure 2: Dependency parse of a sentence from a question about Sparta.

positionality over the standard rnn model bytaking into account relation identity along withtree structure. We include an additional d � dmatrix, Wv, to incorporate the word vector xw

at a node into the node vector hn.Given a parse tree (Figure 2), we first com-

pute leaf representations. For example, thehidden representation hhelots is

hhelots = f(Wv · xhelots + b), (1)

where f is a non-linear activation function suchas tanh and b is a bias term. Once all leavesare finished, we move to interior nodes withalready processed children. Continuing from“helots” to its parent, “called”, we compute

hcalled =f(WDOBJ · hhelots + Wv · xcalled

+ b). (2)

We repeat this process up to the root, which is

hdepended =f(WNSUBJ · heconomy + WPREP · hon

+ Wv · xdepended + b). (3)

The composition equation for any node n withchildren K(n) and word vector xw is hn =

f(Wv · xw + b +X

k�K(n)

WR(n,k) · hk), (4)

where R(n, k) is the dependency relation be-tween node n and child node k.

3.2 Training

Our goal is to map questions to their corre-sponding answer entities. Because there area limited number of possible answers, we canview this as a multi-class classification task.While a softmax layer over every node in thetree could predict answers (Socher et al., 2011;Iyyer et al., 2014), this method overlooks thatmost answers are themselves words (features)in other questions (e.g., a question on World

War II might mention the Battle of the Bulgeand vice versa). Thus, word vectors associatedwith such answers can be trained in the samevector space as question text,2 enabling us tomodel relationships between answers insteadof assuming incorrectly that all answers areindependent.

To take advantage of this observation, wedepart from Socher et al. (2014) by trainingboth the answers and questions jointly in asingle model, rather than training each sep-arately and holding embeddings fixed duringdt-rnn training. This method cannot be ap-plied to the multimodal text-to-image mappingproblem because text captions by definition aremade up of words and thus cannot include im-ages; in our case, however, question text canand frequently does include answer text.

Intuitively, we want to encourage the vectorsof question sentences to be near their correctanswers and far away from incorrect answers.We accomplish this goal by using a contrastivemax-margin objective function described be-low. While we are not interested in obtaining aranked list of answers,3 we observe better per-formance by adding the weighted approximate-rank pairwise (warp) loss proposed in Westonet al. (2011) to our objective function.

Given a sentence paired with its correct an-swer c, we randomly select j incorrect answersfrom the set of all incorrect answers and denotethis subset as Z. Since c is part of the vocab-ulary, it has a vector xc � We. An incorrectanswer z � Z is also associated with a vectorxz � We. We define S to be the set of all nodesin the sentence’s dependency tree, where anindividual node s � S is associated with the

2Of course, questions never contain their own answeras part of the text.

3In quiz bowl, all wrong guesses are equally detri-mental to a team’s score, no matter how “close” a guessis to the correct answer.

=

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

admiral

attacked

kotte

This city ’s economy depended on subjugated peasants called helots

ROOT

DET POSSESSIVE

POSSNSUBJ

PREP

POBJ

AMODVMOD DOBJ

Figure 2: Dependency parse of a sentence from a question about Sparta.

positionality over the standard rnn model bytaking into account relation identity along withtree structure. We include an additional d � dmatrix, Wv, to incorporate the word vector xw

at a node into the node vector hn.Given a parse tree (Figure 2), we first com-

pute leaf representations. For example, thehidden representation hhelots is

hhelots = f(Wv · xhelots + b), (1)

where f is a non-linear activation function suchas tanh and b is a bias term. Once all leavesare finished, we move to interior nodes withalready processed children. Continuing from“helots” to its parent, “called”, we compute

hcalled =f(WDOBJ · hhelots + Wv · xcalled

+ b). (2)

We repeat this process up to the root, which is

hdepended =f(WNSUBJ · heconomy + WPREP · hon

+ Wv · xdepended + b). (3)

The composition equation for any node n withchildren K(n) and word vector xw is hn =

f(Wv · xw + b +X

k�K(n)

WR(n,k) · hk), (4)

where R(n, k) is the dependency relation be-tween node n and child node k.

3.2 Training

Our goal is to map questions to their corre-sponding answer entities. Because there area limited number of possible answers, we canview this as a multi-class classification task.While a softmax layer over every node in thetree could predict answers (Socher et al., 2011;Iyyer et al., 2014), this method overlooks thatmost answers are themselves words (features)in other questions (e.g., a question on World

War II might mention the Battle of the Bulgeand vice versa). Thus, word vectors associatedwith such answers can be trained in the samevector space as question text,2 enabling us tomodel relationships between answers insteadof assuming incorrectly that all answers areindependent.

To take advantage of this observation, wedepart from Socher et al. (2014) by trainingboth the answers and questions jointly in asingle model, rather than training each sep-arately and holding embeddings fixed duringdt-rnn training. This method cannot be ap-plied to the multimodal text-to-image mappingproblem because text captions by definition aremade up of words and thus cannot include im-ages; in our case, however, question text canand frequently does include answer text.

Intuitively, we want to encourage the vectorsof question sentences to be near their correctanswers and far away from incorrect answers.We accomplish this goal by using a contrastivemax-margin objective function described be-low. While we are not interested in obtaining aranked list of answers,3 we observe better per-formance by adding the weighted approximate-rank pairwise (warp) loss proposed in Westonet al. (2011) to our objective function.

Given a sentence paired with its correct an-swer c, we randomly select j incorrect answersfrom the set of all incorrect answers and denotethis subset as Z. Since c is part of the vocab-ulary, it has a vector xc � We. An incorrectanswer z � Z is also associated with a vectorxz � We. We define S to be the set of all nodesin the sentence’s dependency tree, where anindividual node s � S is associated with the

2Of course, questions never contain their own answeras part of the text.

3In quiz bowl, all wrong guesses are equally detri-mental to a team’s score, no matter how “close” a guessis to the correct answer.

=

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Using Compositionality

this chinese

admiral

attacked

kotte

in

ceylon

This city ’s economy depended on subjugated peasants called helots

ROOT

DET POSSESSIVE

POSSNSUBJ

PREP

POBJ

AMODVMOD DOBJ

Figure 2: Dependency parse of a sentence from a question about Sparta.

positionality over the standard rnn model bytaking into account relation identity along withtree structure. We include an additional d � dmatrix, Wv, to incorporate the word vector xw

at a node into the node vector hn.Given a parse tree (Figure 2), we first com-

pute leaf representations. For example, thehidden representation hhelots is

hhelots = f(Wv · xhelots + b), (1)

where f is a non-linear activation function suchas tanh and b is a bias term. Once all leavesare finished, we move to interior nodes withalready processed children. Continuing from“helots” to its parent, “called”, we compute

hcalled =f(WDOBJ · hhelots + Wv · xcalled

+ b). (2)

We repeat this process up to the root, which is

hdepended =f(WNSUBJ · heconomy + WPREP · hon

+ Wv · xdepended + b). (3)

The composition equation for any node n withchildren K(n) and word vector xw is hn =

f(Wv · xw + b +X

k�K(n)

WR(n,k) · hk), (4)

where R(n, k) is the dependency relation be-tween node n and child node k.

3.2 Training

Our goal is to map questions to their corre-sponding answer entities. Because there area limited number of possible answers, we canview this as a multi-class classification task.While a softmax layer over every node in thetree could predict answers (Socher et al., 2011;Iyyer et al., 2014), this method overlooks thatmost answers are themselves words (features)in other questions (e.g., a question on World

War II might mention the Battle of the Bulgeand vice versa). Thus, word vectors associatedwith such answers can be trained in the samevector space as question text,2 enabling us tomodel relationships between answers insteadof assuming incorrectly that all answers areindependent.

To take advantage of this observation, wedepart from Socher et al. (2014) by trainingboth the answers and questions jointly in asingle model, rather than training each sep-arately and holding embeddings fixed duringdt-rnn training. This method cannot be ap-plied to the multimodal text-to-image mappingproblem because text captions by definition aremade up of words and thus cannot include im-ages; in our case, however, question text canand frequently does include answer text.

Intuitively, we want to encourage the vectorsof question sentences to be near their correctanswers and far away from incorrect answers.We accomplish this goal by using a contrastivemax-margin objective function described be-low. While we are not interested in obtaining aranked list of answers,3 we observe better per-formance by adding the weighted approximate-rank pairwise (warp) loss proposed in Westonet al. (2011) to our objective function.

Given a sentence paired with its correct an-swer c, we randomly select j incorrect answersfrom the set of all incorrect answers and denotethis subset as Z. Since c is part of the vocab-ulary, it has a vector xc � We. An incorrectanswer z � Z is also associated with a vectorxz � We. We define S to be the set of all nodesin the sentence’s dependency tree, where anindividual node s � S is associated with the

2Of course, questions never contain their own answeras part of the text.

3In quiz bowl, all wrong guesses are equally detri-mental to a team’s score, no matter how “close” a guessis to the correct answer.

=

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 39 of 49

Deep Quiz Bowl

Training

• Initialize embeddings fromword2vec

• Randomly initialize compositionmatrices

• Update using warp

� Randomly choose an instance

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 40 of 49

Deep Quiz Bowl

Training

• Initialize embeddings fromword2vec

• Randomly initialize compositionmatrices

• Update using warp

� Randomly choose an instance� Look where it lands

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 40 of 49

Deep Quiz Bowl

Training

• Initialize embeddings fromword2vec

• Randomly initialize compositionmatrices

• Update using warp

� Randomly choose an instance� Look where it lands

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 40 of 49

Deep Quiz Bowl

Training

• Initialize embeddings fromword2vec

• Randomly initialize compositionmatrices

• Update using warp

� Randomly choose an instance� Look where it lands� Has a correct answer

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 40 of 49

Deep Quiz Bowl

Training

• Initialize embeddings fromword2vec

• Randomly initialize compositionmatrices

• Update using warp

� Randomly choose an instance� Look where it lands� Has a correct answer� Wrong answers may be closer

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 40 of 49

Deep Quiz Bowl

Training

• Initialize embeddings fromword2vec

• Randomly initialize compositionmatrices

• Update using warp

� Randomly choose an instance� Look where it lands� Has a correct answer� Wrong answers may be closer� Push away wrong answers� Bring correct answers closer

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 40 of 49

Deep Quiz Bowl

Training

• We use rnn information from parsed questions

• And bag of words features from Wikipedia

• Combine both features together

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 41 of 49

Deep Quiz Bowl

Embedding

������

������

��� � ������� �� ���� ����� � ��� ������ ������ ������ � � �� �������� ����

����� ���

��!�� ������"

������

#�$�� �$��

���"�� ���

� � �# % ������

��� ���

��$��

��� � �$��

&��� �$� &�� �

%�$�� � '� ����

�$����� (%��"��

��$� �

���� �# ��$���

&��� ����"��

$���� � �� "��

��#�� ���! �������!���� ���� !���� ����

�" %������ )�� ���"��*

" �" �� � ��� ���

%����� ��'��� �

!����� %����$��

� ��� ������ �# � �� ����� � �� ���� ���" � &�

�$"�� ���

&��� �����

�$��$� !��$����

"$�����

��$���� �$���

��"�� �# �� ���" '��! �

����"$�� �$�

��$ � � �������

����� � �+ %

�������� �# �,-.

���� �# �����"�

���� �# �����

'�

� %�� ���

�% � ���

� ���� � ����

���"

&�� �� ������

���

��!�� �� ��

���� �# " �����$"

��'��� "���� !

�� �$

/$� �"��� � ��� ��

% �# �� ������ �$�� �����

��� �+� ��

" �" � �

#�$� � ������

��� !� ������ ��$����

���" ������

��� � �����

���������� ���

��#��� ��(

� �� �# ������$��

�� � �� " �

"��� � ���

#������ �� �� �

������ �

&� � "� ���� ��� ��' !� $��� � ��� �

���"$�

%����� %�' )#����$�� *

� �� !�� �# �"���

'��� � �$

' ���""����� ���

���� �# �$���� �

� �� �# � �������!�'

%����� � ��

� ������ ������

� �� �� �������

%��" ��� )$��� � ��� �*

����� ##��� �� ��%

%���' � � �����

���� �# ���%�

���� �# � ����

���# ������

" �" � �� � �� $

���� �# �� ����� ��"��������� )� ���*

���' �%' %

���� �# ��� �� "

������ ����$�� ��

��!������ �# ��#��

���� �# ���$�

�$ ( �����

����$�

�� � �����

&� � ��

���"� ����%��� #���� #�

'�'$ ���"$��

&$��$� ��

# � ��' ��$"���

�� �$� $

�"�'�

�$� ��� �� �"��#�� ��

#�'���� %

% �# � !��$����

��� �"�

��� � ����

���� �# ��� �

�� �� �$��

��'

� �� �! � ���

����� �� � ���� )���������*

���� �# �� ��� �

� � ��� ����

���� �# ���%��� #� ��

�" �� �� �

� ��� �+��""���

� !��� ���� ���

� �� �$����

��$� �� " �

&� ����

�$"� �� �

&� � � "#� ��

���� �� ���

�����$� ��������

���� �# � �� "$�#

���� �$����

� �� ##�

�� �� �

%����� %� "������

%�� �����

�� "���%� ��$�� � �

��� '� ��&�

��� ����

��� �$��� �

�$�� ������

��� �$��� �

��'$"% ���"$��

� �� �# /$����

��$�� ��� ��

���� �# � ����

'�� ���

� ���� &$ (

���� �# ��� ��

%���' � ��"

���� !�� �����'

���' �� %����"���

���� �# �����'�$� $" � !� � ��

� ���

&� ��� ��

" � ��$���� ����

���' � ##�

#�����

#���� ����� ��

���� �# ������

������

&��� &�

�� ����"

&���

���' #�

$���� ������

�$����

#���� �'

�� ����

����+ � �����! �$�

����!��� ��! ������!�� ���$�"

�����

��$ � "��� �

���� �# �$�' �������� �# ���� �

�!�� ��!��"����

������ ��� �

��" ������

� $����� '��"���

� ���� !���� ��$����

�# � � �#$�

� �� �� �!�"��

� ���� ��� �

� ����+ !���

"�$� ��$�

" �$��� !"�

����� "� �

&��� �� ���� �

� "$ �# ������

#��� ���� �# �$�� $�

�# � �� " �

� ���� � (�� !

�� �'

���" ���

������ (��

&� � ����

%����% %�����

!����

%����� � �� �����

���� �# �$ ��

���� �# (�

&$������ �

���� $� �� ! ��

� ��� ��� �

'%� �'$��

�� � �# %���

" �" �����" �$��

���� �# "����$�

� ����� %�

��� &��

� �"� ! ��$���

&��� #��� �$�� �

� �� �� � ���

� "�� ���

��$��$�

�� � � ��

�� �� � ��$

���$� %�

� �� �# $� ���'��"��� �# ���

� ��� �������

�� ��� !� # "$���

��� " � �

�� �' ���

� ��� ' ��'�

&�/$ �

� �� �# "� ���� �# ��"� ��!����

������ ��� ( � ��� ��

" � ���� � %

� �� � �# �"���

��$���� �# � ��

����" '���� '

��$ � &� ���� �

#�� � ����

%����� ��!���

�$� ������

������

� �&��� �����

% �# �� $���� �$�� �������� � %

&��� ��%� )�����������*

� ��� ��� �����

�� � �� #��

��$� ������$� ���$�

� % �

% � "� ����"

"$���%� ����

��� �� � ���'

��$ � ���

&��� � � ( �"

����� ���

# ���� ���

��� ������� %+� �� ����

��$ % �� �� �� ��� �$' �# % ����"���

��� � � "$��

� �� ����'�

�$"� �� �

� ��� �� �� �� ��

��� � ����

%����� �� � %�

$�� #�� �� �� �

�#!�� ������

�$���� �� &����

'$��$'��#

������� � � �� ���

�$� � �$��� �

�$����

�$�� ��$�

������ �� �# �� ���

"$ ���� �� "��� ���� �

��(���� ���

���" ���� �$��

����� ���

#�� ������ �"

����� ���

��� � �� �

%����� & ����"� ���

� ��� �� �# $���

# ����� �" ���

��� �# #�'���

�!� �� � ���

���� �$�� )�0-1 #���*

����� #�����

#������ #���

�� � � �� $� ��

���� "�

� ���� �$�������

� �� ���

'���� � ���� �

���' ��� �# ���$��

��� �� ������

&��� %��' � �����

&��� ���

�� � %����

�$ � ���"

��'$"% � ��$

����� ���

��'�� '�$���� !

�� % &�'���

���$"�

���$ ��� )$'*

���� ������

&��� /$���� ���

���" �� �# � ���

� �$�� �

&�/$ � ���

���� �# �� ��� �

������ ��!�� %

������

#$"���! ��! �%�

&��� � �������

��$���� �# ���� ���

����� !���

% �# �� ���#��

" �" %���

�$�� �� ������

��$� "! �

"�! �� ! ���&��� ��

" �" �� ���� ���

����� ���# ���

!��$� � ��"

&��� �����'

%����� ����� "����

���� � ��� ��

��� /$ � �# �����

#��� ��$�!��

#������ !�/$ ( � ������

�" � �����

�� �� ����$�� ��

��� ��� � "$

� �� ����� ���� �

&$�� !��$����

�� �� � � ��$"��

��( ##�

&���� ��

#������ ��(�

'$��� '��

!��� � "�

���

���� �# ��� ���

��� �� ���# ���

�$��# ' �� ��$'

� � �� " �

"�" �# #�$

���� �# ���� ����!���

�!�� ����� " �"

����� �(��

��"� �� �# �,.2

��"�� ��$�"

��� � ����� "�

�$����+� �� ������

����

��� ����$�

& ## ��� �!��

� �� ��

��� ���

���# ����

� � ���� ����

������ � ���� ��$�� �# �!�$

� �&� �������

���' ���� ���

�' �����

#�'��� �� �

����� �"$� �

(��� ����

�� ������

��� �� ! ���� ��

&� � '� ���'

��$"�� ����$

������ ����

�������� ��� �����

" ���' ���

� ���� �� %���� �$��

&� � �$����

&��� �� ��

" �" %����"���

%� �# �� �� �

������ ���

� �� ��� ! ��

���" �� �# !� ��

&��� %����##

%������ ��$�����

����� "$�����

��"$ � ����"� � �������

� ���� ��' �# �� $��� � ��� �

��$���� �# �������

� � � #��� ���! �����

#��� �$��

���� "� %

�����" � �����

�� ( ���"

�$� !�� ���� ��$"

���� ���" ��

&%��� � �$

���� �# �� �� ��

��� �� �

������ � ���! � ��(

� � � ���!

� ��� �� ����

���� "����

%����� ��%� �#������ & ## ���

"�� �� � ���

���! ���% ��

����� �� ���

���� �# $�� ���(

� �&��� ��� ��

"��� � �$���

"����� �!����

� �� �# ��� �����

���� �# �����

��(� �� ��� ������

���� �# '��"� ��$������������ ���$��$�

%����� �� ���/$ �

���� �# �#�"

��� � !�� �$"� �

�� ���� � �

%����� � �$�� � �� ��

���$�$ � � � '�

�"$ ����"

��$

� ������ ��� %

&�/$ � �/$ ��

� �

�"$�

��'���

���� !� �$ �

���$� ��

��� � �� %� �� ��

�%� �� ���# ���

������ $ ���

�� � %���� ����

���� �# �� �$�"

&��� ���

�"���� ���

� �� ���� ���"

"�$� �� "�����

��� � % ���

&��� �� ����$�

� �� �# %���"�

( �$��� ��'

" �"��� '��

��!�� ������" %����� ��'��� �

&� � ����

%����% %�����

%����� � �� �����

� �&��� �����

����� #�����

���� "�

&��� ��� �� % &�'���&��� /$���� ���

"�! �� ! ���

&���� ��

#�'��� �� �

(��� ����

&� � �$����

" �" %����"���

� � � ���! %����� ��%� �#�

����� & ## ���

���� !� �$ �

&��� ���

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 42 of 49

Deep Quiz Bowl

Comparing rnn to bow

History LiteratureModel Sent 1 Sent 2 Full Sent 1 Sent 2 Fullbow-qb 37.5 65.9 71.4 27.4 54.0 61.9rnn 47.1 72.1 73.7 36.4 68.2 69.1bow-wiki 53.7 76.6 77.5 41.8 74.0 73.3combined 59.8 81.8 82.3 44.7 78.7 76.6

Percentage accuracy of di↵erent vector-space models.

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 43 of 49

Deep Quiz Bowl

Comparing rnn to bow

History LiteratureModel Sent 1 Sent 2 Full Sent 1 Sent 2 Fullbow-qb 37.5 65.9 71.4 27.4 54.0 61.9rnn 47.1 72.1 73.7 36.4 68.2 69.1bow-wiki 53.7 76.6 77.5 41.8 74.0 73.3combined 59.8 81.8 82.3 44.7 78.7 76.6

Percentage accuracy of di↵erent vector-space models.

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 43 of 49

Deep Quiz Bowl

Comparing rnn to bow

History LiteratureModel Sent 1 Sent 2 Full Sent 1 Sent 2 Fullbow-qb 37.5 65.9 71.4 27.4 54.0 61.9rnn 47.1 72.1 73.7 36.4 68.2 69.1bow-wiki 53.7 76.6 77.5 41.8 74.0 73.3combined 59.8 81.8 82.3 44.7 78.7 76.6

Percentage accuracy of di↵erent vector-space models.

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 43 of 49

Deep Quiz Bowl

Comparing rnn to bow

History LiteratureModel Sent 1 Sent 2 Full Sent 1 Sent 2 Fullbow-qb 37.5 65.9 71.4 27.4 54.0 61.9rnn 47.1 72.1 73.7 36.4 68.2 69.1bow-wiki 53.7 76.6 77.5 41.8 74.0 73.3combined 59.8 81.8 82.3 44.7 78.7 76.6

Percentage accuracy of di↵erent vector-space models.

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 43 of 49

Deep Quiz Bowl

Now we’re able to beat humans

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 44 of 49

Deep Quiz Bowl

Examining vectors

Thomas MannJoseph Conrad

Henrik IbsenFranz Kafka

Henry James

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 45 of 49

Deep Quiz Bowl

Examining vectors

AkbarShah Jahan

MuhammadBabur

Ghana

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 45 of 49

Deep Quiz Bowl

Future Steps

• Using Wikipedia: transforming them into question-like sentences

• Incorporating additional information: story plots

• New network structures: capturing coreference

• Exhibition

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 46 of 49

Deep Quiz Bowl

(a) Buzzes over all Questions

(b) Wuthering Heights Question Text (c) Buzzes on Wuthering Heights

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 47 of 49

Deep Quiz Bowl

Accuracy vs. Speed

Number of Tokens Revealed

Accuracy

0.4

0.6

0.8

1.0

40 60 80 100

Total

500

1000

1500

2000

2500

3000

Jordan Boyd-Graber | Boulder Deep Learning for Question Answering | 48 of 49


Recommended