Relational Entity Linking with Cross Document Coreference

Post on 24-Feb-2016

66 views 0 download

Tags:

description

Relational Entity Linking with Cross Document Coreference. Xiao Cheng, Bingling Chen, Rajhans Samdani , Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG). Talk Outline. Introduction Architecture Entity Linking Approach Preprocessing - PowerPoint PPT Presentation

transcript

1

Relational Entity Linking with Cross Document Coreference

Xiao Cheng, Bingling Chen, Rajhans Samdani,Kai-Wei Chang, Zhiye Fei and Dan Roth

University of Illinois at Urbana-Champaign (UI_CCG)

2

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

3

Entity Linking Specification

Query

Output

Wikipedia

Arti-cles

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

TAC KBNon-TAC KB

<query id="EL13_ENG_0015"> <docid>bolt-eng-DF-170-181137-9030298</docid> <name>Lightning Bolts</name> <beg>15959</beg> <end>15973</end></query>

query_id link_idEL13_ENG_0015 NIL0006EL13_ENG_0016 E0273299…EL13_ENG_0821 NIL0006

4

Entity Linking using Wikification and Cross-Doc Coref

Wikipedia

Arti-cles

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

TAC KBNon-TAC KB

query_id link_idEL13_ENG_0015

NIL0006

EL13_ENG_0016

E0273299

…EL13_ENG_0821

NIL0006

…EL13_ENG_0937

NIL0288

…EL13_ENG_1914

NIL0288

Cross Document

Coreference

5

Wikification

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

6

Ambiguity

Concepts outside of KB (NIL) Blumenthal ?

Variability

Scale Millions of labels

Wikification Challenges

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

ConnecticutCT

The Nutmeg StateTimesThe New York TimesThe Times

7

Key Innovation

Improved Wikification for Structured EL Relational Inference for Linking (Cheng and Roth, EMNLP’13) No retraining

Non-trivial cross-document clustering Best Latent Left-Linking approach (Samdani et al. ’12)

8

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Evaluation

9

Entity Linking Architecture

Linking

Wikification

Cross-Doc Coreference

Supervise

LinkingProblem

TAC Query

Preprocessing

Query Normalization Document Transformation

Purposeful Coreference

ReconcileLinking Clusters

Query Mapping

NIL

Wikipedia Entities

TAC KB Entities

Partial NIL Clusters

10

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Evaluation

11

Preprocessing

Query normalization Handling spelling mistakes and slangs – one of the reasons we did not

achieve expected performance

In document coreference – some coreferent mentions are easier to link than the query mention

Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a

12

Preprocessing

Document transformation Document can be as long as 100k characters for a single query Need to truncate documents but minimize the loss of critical contexts

Original

Opening

Query Context

Coreferent Context

13

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

14

State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing?

Wikification Bottleneck

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

15

, the of deposed , …

Motivating Example

Mubarak wife Egyptian President Hosni Mubarak

What are we missing with Bag of Words (BOW) models? Who is Mubarak?

Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak)

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

16

Relational Inference for Wikification

Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge

Significant improvement over state-of-the-art Wikification systems

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

(Mubarak, wife, Hosni Mubarak)

17

Mention Segmentation

Candidate Generation

Candidate Ranking

NIL Linking

Traditional Wikification Pipeline

Mention Segmentatio

n

Candidate Generation

Candidate Ranking

Determine NILs

18

Traditional Wikification 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Sub noun phrase chunks NER Capitalized phrases

19

Traditional Wikification 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Obtains nested mentions

20

Traditional Wikification 2 - Candidate Generation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4

1 Socialist_Party_(France)

2 Socialist_Party_(Portugal)

3 Socialist_Party_of_America

4 Socialist_Party_(Argentina)

k ek3

1 Slobodan_Milošević

2 Milošević_(surname)

3 Boki_Milošević

4 Alexander_Milošević

Approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al.

2011)

21

Traditional Wikification 3 - Candidate Ranking

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

Local and global statistical features

22

Traditional Wikification 4 – Determine NILs

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

k ek3 sk

3

1 Slobodan_Milošević 0.7

Is the top candidate really what the text referred to? Binary classifier

This answer is wrong We did not generate the

correct candidate based on top-K prior

23

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

24

Formulation (0)

Intuition Promote pairs of candidate concepts coherent with textual relations

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

(Mubarak, wife, Hosni Mubarak)

25

Formulate as an Integer Linear Program (ILP):

If no relation exists, collapse to the unstructured decision

Formulation (1)

Whether a relation exists between and

weight of a relation

Whether to output th candidate of the th mentionweight to output

26

Formulation (2)

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

r(1,2)34

eki: whether a concept is chosen

ski : score of a concept

r(k,l)ij: whether a relation is present

w(k,l)ij : score of a relation

r(4,3)34

27

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

28

Overall Approach

Relational WikificationCandidate Generation

Candidate Ranking

Determine NILs

Relation Analysis

Relation Identification

Relation Retrieval

Relational Inference

29

Relation Identification

ACE style in-document coreference (Chang et al. ‘13) Extract named entity-only coreference relations with high precision

Syntactico-Semantic relations (Chan & Roth ‘10)

Easy to extract with high precision Aim for high recall, as false-positives will be filtered Sparse, but covers ~80% relation instances in ACE2004

Type Example

Premodifier Iranian Ministry of Defense

Possessive NYC’s stock exchange

Formulaic Chicago, Illinois

Preposition President of the US

30

Relation Identification

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Argument 1 Relation Type Argument 2

Yugoslav President apposition Slobodan Milošević

Slobodan Milošević coreference Milošević

Milošević possessive Socialist Party

31

Overall Approach

Relational WikificationCandidate Generation

Candidate Ranking

Determine NILs

Relation Analysis

Relation Identification

Relation Retrieval

Relational Inference

32

Relation Retrieval

What concepts can “Socialist Party” refer to? More robust candidate generation

Identified relations are verified against a knowledge base (DBPedia)

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

33

Query Pruning Only 2 queries per pair necessary due to strong baseline.

Relation Retrieval

q1=(Socialist Party of France,?, *Milošević*)q2=(Slobodan Milošević,?,*Socialist Party*)

k ek4 sk

4

1 Socialist_Party_(France) 0.23

k ek3 sk

3

1 Slobodan_Milošević 0.7

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

34

Relation Retrieval

Argument 1 Relation Type Argument 2

Milošević possessive Socialist Party

35

Relation Retrieval

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

21 Socialist_Party_of_Serbia 0.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟34(1,21)=1

36

Overall Approach

Relational WikificationCandidate Generation

Candidate Ranking

Determine NILs

Relation Analysis

Relation Identification

Relation Retrieval

Relational Inference

37

Relational Inference - coreference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek2 sk

2

1 Slobodan_Milošević 1.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟23(1,1)=1

38

Determine unknown concepts (NILs)

How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia

Identify coreferent nominal mention relations Generate better features for NIL classifier

Dorothy Byrne, a state coordinator for the Florida Green Party,…

k ek2 sk

2

1 Green_Party_of_Florida 1.0

k ek1 sk

1

1 Dorothy_Byrne_(British_Journalist)

0.6

2 Dorothy_Byrne_(mezzo-soprano)

0.4

nominal mention

39

Determine unknown concepts (NILs)

Create NIL candidate for structured inference e.g. corrects other coreferent “Dorothy” later in the document

Dorothy Byrne, a state coordinator for the Florida Green Party,…

k ek2 sk

2

1 Green_Party_of_Florida 1.0

k ek1 sk

1

0 NIL 1.0

1 Dorothy_Byrne_(British_Journalist)

0.6

2 Dorothy_Byrne_(mezzo-soprano)

0.4

nominal mention

40

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

41

Cross Document Coreference

NILs can be viewed as KB entries with partial information A uniform model for entity representation Shared features with Entity Linking system Can be supervised using existing EL systems

Cross document coreference cluster example:Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman.

Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.

42

Cross Document Coreference Approach

Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features

String similarity features (NESim, Do et al. ‘09) Context TF-IDF similarity features Document-level cluster features

Training: using both TAC data and Wikifier generated data

43

100%75%

50%25%

0%

00.10.20.30.40.50.60.70.80.9

1

Clustering B³ F1 on TAC Entity Linking 2012

Trivial L³M No str feat. L³MAllLink SumLink

44

100%75%

50%25%

0%

00.10.20.30.40.50.60.70.80.9

1

Clustering B³ F1 on Wikifier generated data

Trivial L³M No str feat. L³MAllLink SumLink

45

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

46

Query mapping reconciliation

• Max• {0.8,0.7,0.2} = Seattle Seahawks

• Sum• {0.8,0.7+0.2} = Seattle

• No Threshold• NIL classifier always outputs “non-NIL”• Same as Max otherwise

Seattle (0.7)

Seattle Seahawks (0.8)

Seattle(0.2)

[Seattle] has won…

[Seattle] Seahawks ended the game…

… cheered for [Seattle]…

47

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

48

Evaluation – TAC KBP 2011 Entity Linking

Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data

68

72

76

80

84

88

RI

CogComp

LCC

MS_MLI

NUShim

e

*Median

TAC KBP 2011 Entity Linking Performance

Micro AverageB³F1

System Names*Median of top 14 systems

49

Evaluation – TAC 2012 Entity Linking Error Analysis

Max Sum Perfect Ranking Exact Max0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

B³3 F1

B³3 F1

50

Official 2013 Performance

51

Official 2013 Performance Break-down: Link Type

52

Official 2013 Performance Break-down: Doc domain

53

Official 2013 Performance Break-down: NER type

54

Conclusion

Importance of linguistic and world knowledge Identification of relational information benefits Wikification

and Entity Linking Future work

Robust preprocessing on noisy input/adapt to EL task requirement “Self-supervision” on NIL clustering Unified NIL and KB entity representation Joint entity typing, coreference and disambiguation Incorporate more relations

Demo: http://cogcomp.cs.illinois.edu/demo/wikify Download: http://cogcomp.cs.illinois.edu/page/download_view/Wikifier

Thank you!

55

BACK UP SLIDESBack up slides

56

Applications

Knowledge Acquisition via Grounding Coreference Resolution

Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012)

Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al.

2012) Automatic Event Extraction with Structured Preference Modeling (Lu

and Roth, 2012 ) Text Classification

Gabrilovich and Markovitch, 2007; Chang et al., 2008

57

Wikification Performance Result

ACE MSNBC AQUAINT Wikipedia60

65

70

75

80

85

90

95

F1 Performance on Wikification datasets

Milne&WittenRatinov&RothRelational Inference