+ All Categories
Home > Technology > Reading Group 2014 (Insight NUIG)

Reading Group 2014 (Insight NUIG)

Date post: 05-Jul-2015
Category:
Upload: bianca-pereira
View: 102 times
Download: 1 times
Share this document with a friend
Description:
This was a presentation for Reading Group 2014 in NUIG. The presentation was based on the research paper: Dai et al. "From Entity Recognition to Entity Linking: a Survey of Advanced Entity Linking Techniques". 2012
47
Bianca Pereira From Entity Recognition to Entity Linking 07/05/2014 Based on the paper “From Entity Recognition to Entity Linking: a Survey of Advanced Entity Linking Techniques” from Dai et al. 2012
Transcript
Page 1: Reading Group 2014 (Insight NUIG)

Bianca Pereira

From Entity Recognition to

Entity Linking

07/05/2014

Based on the paper “From Entity Recognition to Entity Linking: a Survey of

Advanced Entity Linking Techniques” from Dai et al. 2012

Page 2: Reading Group 2014 (Insight NUIG)

Outline

• Motivation

• Overview of Entity Linking

• Instance-based Entity Linking Approach

• Experiments

• Conclusion

• Analysis of the Paper

• Relation with my PhD

Page 3: Reading Group 2014 (Insight NUIG)

THE BATTLE OF THE BOOGIE

Page 4: Reading Group 2014 (Insight NUIG)
Page 5: Reading Group 2014 (Insight NUIG)

Named Entity Recognition

k

Source: http://en.wikipedia.org/wiki/Mick_Jackson_(singer) (visited in 06/05/2014)

Page 6: Reading Group 2014 (Insight NUIG)

Named Entity Recognition

k

Source: http://en.wikipedia.org/wiki/Mick_Jackson_(singer) (visited in 06/05/2014)

Page 7: Reading Group 2014 (Insight NUIG)
Page 8: Reading Group 2014 (Insight NUIG)

Overview of Entity Linking

Page 9: Reading Group 2014 (Insight NUIG)

Databases

BiomedicalNatural Language

Processing

AI

Page 10: Reading Group 2014 (Insight NUIG)

Databases

BiomedicalNLP

AI

Page 11: Reading Group 2014 (Insight NUIG)

Entity Linking

Source: http://en.wikipedia.org/wiki/Mick_Jackson_(singer) (visited in 20/11/2013)

http://www.discogs.com/artist

/87624-Mick-Jackson

http://www.discogs.com/artist/6432

65-Elmar-Krohn?noanv=1

http://www.discogs.com/artist/49239

6-Dave-Jackson-2?noanv=1

http://www.discogs.com/artist/

148391-Sylvester-Levay

http://www.discogs.com/artist/16

9154-Jacksons-The

Page 12: Reading Group 2014 (Insight NUIG)

Tasks Inspired Entity Linking

Link-The-Wiki Track in INEX Web People Search Task in

SemEval

URL1

URL2

URLn

Person 1 Person 2

Person 3Person 4

Page 13: Reading Group 2014 (Insight NUIG)

Entity Linking Tasks

Entity Linking in TAC-KBP Gene Normalization in

BioCreative

http://www.discogs.com/artist

/87624-Mick-Jackson

NIL

Syntenin-1 ID:100754014

ID:6386mda-9

Page 14: Reading Group 2014 (Insight NUIG)

Problem Definition

Article-wide Salient Entity Linking Problem

Article-wide Entity Linking Problem

Instance-based Entity Linking Problem

Page 15: Reading Group 2014 (Insight NUIG)

Article-wide Salient EL Problem

Source: http://en.wikipedia.org/wiki/Michael_jordan (visited in 20/11/2013)

Page 16: Reading Group 2014 (Insight NUIG)

Article-wide EL Problem

Source: http://en.wikipedia.org/wiki/Blame_It_on_the_Boogie (visited in 06/05/2014)

Page 17: Reading Group 2014 (Insight NUIG)

Instance-based EL Problem

Source: http://en.wikipedia.org/wiki/Blame_It_on_the_Boogie (visited in 06/05/2014)

Page 18: Reading Group 2014 (Insight NUIG)

Instance-based Entity Linking Approach

Page 19: Reading Group 2014 (Insight NUIG)

Instance-based Entity Linking Approach

Challenges

1. Lack of suitable corpus for developing instance-based EL

systems.

2. Lack of context information for disambiguating each

individual instance.

The synthetic replicate of urocortin was found to bind with high

affinity to type 1 and type 2 CRF receptors and, based upon its

anatomic localization within the brain, was proposed to be a

natural ligand for the type 2 CRF receptors.

Page 20: Reading Group 2014 (Insight NUIG)

Classification

Local Classification

URL1 URL2 URL3 URL4

Page 21: Reading Group 2014 (Insight NUIG)

Classification

Local Classification Relational Classification

URL1 URL2 URL3 URL4

URL1 URL2 URL3 URL4

Page 22: Reading Group 2014 (Insight NUIG)

Classification

Local Classification Relational Classification

URL1 URL2 URL3 URL4

URL1 URL2 URL3 URL4

URL9

9

Page 23: Reading Group 2014 (Insight NUIG)

Collective Classification

URL1 URL2 URL3 URL4

URL3

5

URL4

7

URL9

9 URL1

5

URL2

0

URL5

URL1

3

Page 24: Reading Group 2014 (Insight NUIG)

Collective Entity Disambiguation

1. Discourse SalienceIn a given discourse there is precisely one entity that is the center of

attention.

2. TransitivityIf two mentions refer to the same entity, and one mention has been

linked to a database entry, the other should also be linked to the same entry.

Page 25: Reading Group 2014 (Insight NUIG)

Markov Logic Network Formulation

Observed FeaturesSaliency: Precede(x,y) ^ LinkTo(x,id) ^ Candidate (y,id) => LinkTo(y,id)

ID1 ID2 ID3 ID4

ID2

…Here, we demonstrate that rat syntetin-1, previously

published as syntenin-1 (syntenin), mda-9, or TACIP18 in

human, is a neurofascin-binding protein that exhibits a wide-

spread tissue expression pattern with a relative maximum in

brain. …

Page 26: Reading Group 2014 (Insight NUIG)

Markov Logic Network Formulation

Observed FeaturesSaliency: Precede(x,y) ^ LinkTo(x,id) ^ Candidate (y,id) => LinkTo(y,id)

Observed Features of the NeighborsTransitivity: Coreference(x,y) ^ LinkTo(x,idi) => LinkTo(y,idi)

ID1

…Here, we demonstrate that rat syntetin-1, previously

published as syntenin-1 (syntenin), mda-9, or TACIP18 in

human, is a neurofascin-binding protein that exhibits a wide-

spread tissue expression pattern with a relative maximum in

brain. …

Page 27: Reading Group 2014 (Insight NUIG)

Markov Logic Network Formulation

Observed FeaturesSaliency: Precede(x,y) ^ LinkTo(x,id) ^ Candidate (y,id) => LinkTo(y,id)

Observed Features of the NeighborsTransitivity: Coreference(x,y) ^ LinkTo(x,idi) => LinkTo(y,idi)

Unobserved Features of the NeighborsProtein-protein interaction: LinkTo(x,idi) ^ Candidate(y, idj) ^ PPIPartner(idi,

idj) => LinkTo(y, idj)

Syntanin-1 mda-9

ID1

ID2

ID9

Page 28: Reading Group 2014 (Insight NUIG)

Collective INFERENCE

URL1 URL2 URL3 URL4

URL3

5

URL4

7

URL9

9 URL1

5

URL2

0

URL5

URL1

3

Page 29: Reading Group 2014 (Insight NUIG)

Joint Inference

…Here, we demonstrate that rat syntetin-1, previously

published as syntenin-1 (syntenin), mda-9, or TACIP18 in

human, is a neurofascin-binding protein that exhibits a wide-

spread tissue expression pattern with a relative maximum in

brain. …

Transitivity: Coreference(x,y) ^ LinkTo(x,idi) => LinkTo(y,idi)

syntetin-1 syntetin-1

syntetin

mda-9

TACIP18

Page 30: Reading Group 2014 (Insight NUIG)

Joint Inference

New Constraints

Transitivity2: Coreference(x,y) ^ LinkTo(x,idi) ^ ¬exist idj.LinkTo(y, idj) =>

LinkTo(y, idi)

URL5

…Here, we demonstrate that rat syntetin-1, previously

published as syntenin-1 (syntenin), mda-9, or TACIP18 in

human, is a neurofascin-binding protein that exhibits a wide-

spread tissue expression pattern with a relative maximum in

brain. …

?

Page 31: Reading Group 2014 (Insight NUIG)

Joint Inference

New Constraints

Transitivity2: Coreference(x,y) ^ LinkTo(x,idi) ^ ¬exist idj.LinkTo(y, idj) =>

LinkTo(y, idi)

Coreference(x,y) => SuitablyLink(x) ^ SuitablyLink(y)

LinkTo(x,id) => SuitablyLink(x)

…Here, we demonstrate that rat syntetin-1, previously

published as syntenin-1 (syntenin), mda-9, or TACIP18 in

human, is a neurofascin-binding protein that exhibits a wide-

spread tissue expression pattern with a relative maximum in

brain. …

Page 32: Reading Group 2014 (Insight NUIG)

Experiments

Page 33: Reading Group 2014 (Insight NUIG)

Corpus

IGML Corpus (Instance-based Gene Mention Linking)

Training Set Test Set

Number of articles 282 262

Number of gene mentions 2,813 3,143

Number of linked Entrez Gene IDs 2,861 3,187

Number of words per article 215.86 228.91

Number of mentions per article 10.01 12.00

Number of words per mention 1.52 1.35

Number of IDs per mention 1.02 1.01

Page 34: Reading Group 2014 (Insight NUIG)

Corpus

IGML Corpus (Instance-based Gene Mention Linking)

Training Set Test Set

Number of articles 282 262

Number of gene mentions 2,813 3,143

Number of linked Entrez Gene IDs 2,861 3,187

Number of words per article 215.86 228.91

Number of mentions per article 10.01 12.00

Number of words per mention 1.52 1.35

Number of IDs per mention 1.02 1.01

Syntenin-1URL5

U

Page 35: Reading Group 2014 (Insight NUIG)

Corpus

IGML Corpus (Instance-based Gene Mention Linking)

Training Set Test Set

Number of articles 282 262

Number of gene mentions 2,813 3,143

Number of linked Entrez Gene IDs 2,861 3,187

Number of words per article 215.86 228.91

Number of mentions per article 10.01 12.00

Number of words per mention 1.52 1.35

Number of IDs per mention 1.02 1.01

Human and rat syntenin-1 The mammalian syntenin-1

Page 36: Reading Group 2014 (Insight NUIG)

Corpus – Gene Mention Recognition

Set Precision Recall F-Measure

Training 55.3 83.4 66.5

Test 66.2 82.7 65.1

Page 37: Reading Group 2014 (Insight NUIG)

Corpus– Training Set

0

10

20

30

40

50

60

70

80

90

Precision Recall Fmeasure

Optimal Linking

Best Linking

Worst Linking

Page 38: Reading Group 2014 (Insight NUIG)

Corpus – Test Set

0

10

20

30

40

50

60

70

80

90

Precision Recall F-Measure

Optimal Linking

Best Linking

Worst Linking

Page 39: Reading Group 2014 (Insight NUIG)

Evaluation

Training Test

Feature P R F P R F

Saliency Discourse 79.2 50.2 61.5 79.5 59.0 67.7

Protein-protein Interaction 79.4 51.1 62.2 80.1 59.8 68.5

Transitivity 78.5 49.5 60.7 78.6 58.8 67.2

Page 40: Reading Group 2014 (Insight NUIG)

Evaluation

Training Test

P R F P R F

Random Baseline 68.4 51.6 58.8 68.3 59.8 63.

8

Collective 79.1 52.0 62.8 78.4 61.0 68.

6

Collective + Filtering 79.3 52.0 62.9 78.8 61.0 68.

8

Individual 74.9 54.3 62.9 75.7 61.7 68.

0

Collective + Individual 74.5 55.7 63.7 74.9 64.8 69.

5

Collective + Individual + Filtering 79.9 54.9 65.1 77.8 65.3 71.

0

Page 41: Reading Group 2014 (Insight NUIG)

Conclusion

- Overview of Entity Linking

- Why is Instance-based Entity Linking more challenging?

- Suggestion of a solution to the problem

Page 42: Reading Group 2014 (Insight NUIG)

Analysis

Page 43: Reading Group 2014 (Insight NUIG)

Cons

- The results do not lead to any conclusion.

- Too much abbreviations in the paper.

- Does the approach converge to a optimal solution?

- How long does it take to give a solution?

- Is there any case that could not be disambiguated by

human annotators?

Page 44: Reading Group 2014 (Insight NUIG)

Pros

- Outlier

- “The instance-based EL task requires deeper linguistic

analysis and domain dependent knowledge to infer each

instance’s identity.”

Page 45: Reading Group 2014 (Insight NUIG)

Databases

BiomedicalNatural Language

Processing

AI

Semantic Web

Page 46: Reading Group 2014 (Insight NUIG)

How is it Related to my PhD?

I am working on the Entity Linking Topic.

Generic Approach Focus on Linguistic Features Linked Data as Knowledge Base Scalability

Page 47: Reading Group 2014 (Insight NUIG)

Thank you!


Recommended