1 Learning with Latent Alignment Structures Quasi-synchronous Grammar and Tree-edit CRFs for...

Post on 15-Jan-2016

217 views 1 download

transcript

1

Learning with Latent Alignment Structures

Quasi-synchronous Grammar and Tree-edit CRFs for Question Answering and Textual Entailment

Mengqiu WangJoint work with Chris Manning, Noah Smith

2

Task definition

• At a high-level:• Learning the syntactic and semantic relations between two pieces of text

• Application-specific definition of the relations• Question Answering

Q: Who is the leader of France?A: Bush later met with French President Jacques Chirac

• Machine TranslationC: 温总理昨天会见了日本首相安培晋三。E: Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday.

• SummarizationT: US rounds up 400 Saddam diehards as group claims anti-US attacks in

Iraq.S: US rounded up 400 people in Iraq.

• Textual Entailment (IE, IR, QA, SUM)Txt: Responding to Scheuer's comments in La Repubblica, the prime

minister's office said the analysts' allegations, "beyond being false, are also absolutely incompatible with the contents of the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler."

Hyp: Mel Sembler represents the U.S.

3

The Challenges• Latent alignment structure

• QA: Who is the leader of France?

Bush later met with French President Jacques Chirac

• MT: 温总理昨天会见了日本首相安培晋三。

Premier Wen Jiabao met with Japanese Prime Minister Shinzo Abe yesterday.

• Sum: US rounds up 400 Saddam diehards as group claims anti-US attacks in Iraq.

US rounded up 400 people in Iraq.

• RTE: Responding to … the conversation between Prime Minister Silvio Berlusconi and U.S. Ambassador to Rome Mel Sembler.“

Mel Sembler represents the U.S.

4

Other modeling challenges

Question Answer Ranking

Who is the leader of France?

1. Bush later met with French president Jacques Chirac.

2. Henri Hadjenberg, who is the leader of France ’s Jewish community, …3. …

5

Semantic Tranformations

• Q:“Who is the leader of France?”

• A: Bush later met with French president Jacques Chirac.

6

Syntactic Transformations

Who leaderthe Franceofis ?

Bush met Frenchwith president Jacques Chirac

mod mod

mod

7

Syntactic Variations

Who leaderthe Franceofis ?

Henri Hadjenberb , who leaderis the of France ’ s Jewish community

mod mod

mod

mod

8

What’s been done?

• The latent alignment problem• Instead of treating alignment as latent variable, treat it as a

separate task. First find the best alignment, then proceed with the rest of the task

• Pros: Usually simple and efficient.• Cons: Not very robust, no way to correct alignment errors

in later steps.

• Modeling syntax and semantics• Extract features from syntactic parse trees and semantic

resources then throw them into a linear classifier. Use syntax and semantic to enrich the feature space, but no principled ways to make use of syntax

• Pros: No need to worry about trees too much• Cons: Ad-hocs

9

What I think an ideal model should do

• Carry alignment uncertainty into final task• Treat alignment as latent variables and jointly learn

about proper alignment structure and the overall task• In other words, model the distribution over

alignments and sum out all possible alignments at decoding time.

• Syntax-based and feature-rich models• Directly model syntax• Enable the use of rich semantic features and features

from other world-knowledge resources.

10

Road map

• Present two models that address the raised issues• 1: A model based on Quasi-synchronous Grammar

(EMNLP 07’)• Experiments on Question Answering task

• 2: A tree-edit CRFs model (current work)• Experiments on RTE

• Discuss and compare these two models• Modeling power• Pros and cons

• Future work

11

Switching gear…

• Quasi-synchronous Grammar for Question Answering

12

Tree-edit CRFs for RTE

• Extension to McCallum et al. UAI2005 work on CRFs for finite-state String Edit Distance

• Key attractions:• Models the transformation of dependency parse trees (thus

directly models syntax), unlike McCallum et al. ’05, which only models word strings

• Discriminatively trained (not a generative model, unlike QG)

• Trained on both the positive and negative instances of sentence pairs (QG is only trained on positive Q/A pairs)

• CRFs – the underlying graphical model is an undirected graphical model (QG is basically a Bayes Net, directed)

• Joint model over alignments (vs. local alignment models in QG)• Feature rich

13

TE-CRFs model in details

• First of all, let’s look at the correspondence between alignment (with constraints) and edit operations

BushNNP

person

metVBD

FrenchJJ

location

presidentNN

Jacques ChiracNNP

person

whoWP

qword

leaderNN

isVB

theDT

FranceNNP

location

Q: A:$

root$

root

root

subj obj

det of

root

subj with

nmod

nmod

substitute

substitute

substitute

substitute

delete

insert

Fancy

substitute

15

TE-CRFs model in details• Each valid tree edit operation sequence that

transforms one tree into the other corresponds to an alignment. A tree edit operation sequence is models as a transition sequence among a set of states in a FSM

S1 S2

S3D, S, I

D, S, I D, S, I

D, E, I

D, S, I

D, S, I

D, S, I

substitute delete substitute substitute insert substitute

S1 S2 S1 S1 S1 S3 S1

S2 S3 S2 S1 S2 S1 S3

S3 S2 S1 S3 S3 S2 S2

… … … …… ……

16

FSM

This is for one edit operation sequence

substitute delete substitute substitute insert substitute

S1 S2 S1 S1 S1 S3 S1

S2 S3 S2 S1 S2 S1 S3

S3 S2 S1 S3 S3 S2 S2

… … … …… ……

delete substitute substitute substitute insert substitute

S1 S2 S1 S1 S1 S3 S1… … … …… ……

substitute delete substitute substitute substitute insert

S1 S2 S1 S1 S1 S3 S1… … … …… ……

substitute substitute delete substitute insert substitute

S1 S2 S1 S1 S1 S3 S1… … … …… ……

There are many other valid edit sequences

17

FSM cont.

S1 S2

S3D, S, I

D, S, I D, S, I

D, S, I

D, S, I

D, S, I

D, S, I

Start Stop

ε ε

S1 S2

S3D, S, I

D, S, I D, S, I

D, S, I

D, S, I

D, S, I

D, S, I

Positive State Set

Negative State Set

εε

18

FSM transitions

S3S2

S1 S1S3

S2

S2

Start

S2 S3 S3S3 S1

S1S2 S2

S2S2 S1

S2S1

S3S3

S3

… …… …

S2… …

Stop

S3S2

S1 S1S3

S2

S2S2 S3 S3

S3 S1

S1S2 S2

S2S2 S1

S2S1

S3S3

S3

… …… …

S2

… …

Positive State Set

Negative State Set

19

Parameterization

S1 S2substitute

positive or negative

positive and negative

20

Training using EM

E-stepM-step

Using L-BFGS

Jensen’s Inequality

21

Features for RTE

• Substitution• Same --Word/WordWithNE/Lemma/NETag/Verb/Noun/Adj/Adv/Other• Sub/MisSub -- Punct/Stopword/ModalWord• Antonym/Hypernym/Synonym/Nombank/Country• Different – NE/Pos• Unrelated words

• Delete• Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/

Conditional/If• Insert

• Stopword/Punct/NE/Other/Polarity/Quantifier/Likelihood/Conditional/If

• Tree• RootAligned/RootAlignedSameWord• Parent,Child,DepRel triple match/mismatch

• Date/Time/Numerical• DateMismatch, hasNumDetMismatch, normalizedFormMismatch

22

Tree-edit CRFs for Textual Entailment

• Preliminary results• Trained on RTE2 dev, tested on RTE2 test.• model taken after 50 EM iterations• acc:0.6275, map:0.6407.

• RTE2 official results1. Hickl (LCC) acc:0.7538, map:0.80822. Tatu (LCC) acc:0.7375, map:0.71333. Zanzotto (Milan & Rome) acc:0.6388, map:0.64414. Adams (Dallas) acc:0.6262, map:0.6282

23

Comparison: QG vs. TE-CRFs

1. Generative2. Directed, BayesNet, local3. Allow arbitrary swapping

in alignment

4. Allow limited use of semantic features (lexical-semantic log-linear model in mixture model)

5. Computationally cheaper

1. Discriminative2. Undirected, CRFs, global3. No swapping – can’t do

substitutions that involve swapping (can be extended, see future work)

4. Allow arbitrary semantic features

5. Computationally more expensive

QG TE-CRFs

24

Future work

1. Generative• Train discriminatively using

Noah’s Contrastive Estimation

2. Directed, BayesNet, local• Higher-order Markovization

3. Allow arbitrary swapping in alignment

4. Allow limited use of semantic features (lexical-semantic log-linear model in mixture model)

5. Computationally cheaper6. Run RTE experiments

1. Discriminative

2. Undirected, CRFs, global

3. No swapping• Constrained unordered

trees• Fancy edit operations (e.g.

substitute sub-trees)

4. Allow arbitrary semantic features

5. More expensive6. Run QA and MT alignment

experiments

QG TE-CRFs

25

Thank you!

Questions?