+ All Categories
Home > Technology > Reading Group 2013 (DERI NUIG)

Reading Group 2013 (DERI NUIG)

Date post: 05-Jul-2015
Category:
Upload: bianca-pereira
View: 127 times
Download: 4 times
Share this document with a friend
Description:
Reading Group at DERI, NUIG in 2013 based on the paper "Named Entity Recognition: Fallacies, Challenges & Opportunities" from Marrero et al. 2013
Popular Tags:
37
Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Enabling Networked Knowledge Named Entity Recognition: Fallacies, Challenges & Opportunities Authors: Mónica Marrero, Julián Urbano, Sonia Sánchez- Cuadrado, Jorge Morato, Juan Miguel Gómez-Berbís Presented by: Bianca Pereira
Transcript
Page 1: Reading Group 2013 (DERI NUIG)

Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Named Entity Recognition: Fallacies,

Challenges & Opportunities

Authors: Mónica Marrero, Julián Urbano, Sonia Sánchez-

Cuadrado, Jorge Morato, Juan Miguel Gómez-Berbís

Presented by: Bianca Pereira

Page 2: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Alchemy API Raises $2M

“Alchemy, which launched in 2009, processes 3 billion

API calls per month. It is used in 36 countries (…)”

http://semanticweb.com/alchemy-api-raises-2m_b35276

Page 3: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

“FOX can generate RDF out of natural language with

improved accuracy. FOX has been shown to be up to

15% more accurate than other frameworks, including

commercial software.”

http://semanticweb.com/aksw-announces-federated-knowledge-extraction_b21399

Page 4: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

“There are many open-source and commercial products

out there that attempt to determine sentiment in

tweets, but what is interesting to find out is what entity

is that sentiment attached to.”

http://semanticweb.com/introducing-semanticweb-com-innovation-spotlight-series-with-pingar_b30106

Page 5: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

“DBPedia Spotlight’s ability (…) to support (…) faceted

browsing, customized web feeds (…) enrich blog

content.”

“Many (…) relationship extraction algorithms rely on

entity identification beforehand(…)”

http://semanticweb.com/the-spotlight%E2%80%99s-on-dbpedia_b17942

Page 6: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

“People and places (…) are only a small part of this

wider project (…) around entities that Bing embarked on

a while back.”

http://techcrunch.com/2013/03/21/bing-just-got-a-lot-smarter-now-knows-more-about-people-and-places/

Page 7: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Agenda

What is a (Named) Entity?

Named Entity Recognition evolution

Named Entity Recognition evaluation

Conclusions

How is it related to my PhD?

Page 8: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Named Entity Recognition

What is Named Entity Recognition?

“Identification of mentions to real world entities

in a natural language text. “

Page 9: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Named Entity Recognition

What is Named Entity Recognition?

“Identification of mentions to real world entities

in a natural language text. “

(my words)

Page 10: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Named Entity Recognition

The term “named entity” was coined for the Named

Entity task at the 6th Message Understanding

Conference (MUC-6).

Page 11: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Named Entity Recognition

The term “named entity” was coined for the Named

Entity task at the 6th Message Understanding

Conference (MUC-6).

“Unique identifiers of entities (organizations, persons,

locations), times (dates, times), and quantities

(monetary values, percentages).”

(http://cs.nyu.edu/faculty/grishman/NEtask20.book_2.html#HEADING1)

Page 12: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Named Entity Recognition

In the next year, the definition changed a little bit.

“Named Entities (NE) were defined as proper names

and quantities of interest. Person, organization, and

location names were marked as well as dates, times,

percentages, and monetary amounts.”

(http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings

/muc_7_proceedings/overview.html)

Page 13: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

MUC-7 Results

The results for the MUC-7 Named Entity task are

very promising

(http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_

proceedings/marsh_slides.pdf)

Page 14: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Challenges

There were no Message Understanding Conference

anymore…

Page 15: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Challenges

But there was..

Automatic Content Extraction (ACE - 1999)

Computational Natural Language Learning (CoNLL – 2002)

INEX Entity Ranking Track (2007)

TREC Entity Track (2009)

TAC Knowledge Base Population (TAC-KBP – 2009)

Page 16: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

ACE

“ Recognition of entities, not just names. In the ACE

entity detection and tracking (EDT) task, all mentions

of an entity, whether a name, a description, or a

pronoun, are to be found and collected into

equivalence classes based on reference to the same

entity.”

(Doddington et al. 2004)

Page 17: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

CoNLL

“ Named entities are phrases that contain names of

persons, organizations, locations, times and

quantities. (…) We will concentrate on four types of

named entities: persons, locations, organizations and

names of miscellaneous entities that do not belong

to the previous three groups.(…)”

(http://www.clips.ua.ac.be/conll2002/ner/)

Page 18: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

INEX Entity Ranking Track

“ (…) entities (such as countries, people and dates)

requires the estimation of relevance of items (i.e.,

instances of entities) (…) we restricted candidate

items to those entities that have their own

Wikipedia article.”

(De Vries et al. 2007)

Page 19: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

TREC Entity Track

“ A web entity is uniquely identifiable by one of its

primary homepages. Real-world entities can be

represented by multiple homepages.”

(Balog et al. 2009)

Page 20: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

TAC-KBP

“ The tasks will be structured by having participants

process a list of target entities. The list will contain

entity types of Person, Organization and Geo-Political

Entity.”

(http://apl.jhu.edu/~paulmac/kbp/090601-KBPTaskGuidelines.pdf)

Page 21: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

What is a Named Entity?

Proper nouns

Water? Whale? Twelve o’clock?

Rigid designator

Richard Nixon (V) vs President of the United States (X)

Unique identifier

“(…) virtually everything could be referred to uniquely,

depending on the context or the previous knowledge of

the receiver, although a unique identifier for one receiver

might not be so for another one, either because of lack of

shared knowledge or the ambiguity of the context.”

Purpose and domain of application

Page 22: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Evaluation

As the definition changes the evaluation changes.

Each challenge has different..

.. types of Named Entity to identify

.. identification and annotation criteria

.. valid boundaries of a Named Entity

Page 23: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Other problems

How to evaluate current tools with different

definitions of Named Entities?

Page 24: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Other problems

How to evaluate current tools with different

definitions of Named Entities?

Using only Person, Organization and Place.

Using only those tools which work with numbers and

dates.

Using current annotated corpora (and see what happens).

Page 25: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Other problems

How to evaluate current tools with different

definitions of Named Entities?

Using only Person, Organization and Place.

Using only those tools which work with numbers and

dates.

Using current annotated corpora (and see what happens).

How to choose the best tool?

Page 26: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Other problems

How to evaluate current tools with different

definitions of Named Entities?

Using only Person, Organization and Place.

Using only those tools which work with numbers and

dates.

Using current annotated corpora (and see what happen).

How to choose the best tool?

It depends on the application.

Page 27: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

Is NER really solved?

Page 28: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

Is NER really solved?

Content Validity

– Reflect the needs of the real user.

Page 29: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

Is NER really solved?

Content Validity

External Validity

– The experiments can be generalized to other populations and

experimental settings.

Page 30: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

Is NER really solved?

Content Validity

External Validity

Convergent Validity

– The results agree with other results, theoretical or

experimental.

Page 31: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

Is NER really solved?

Content Validity

External Validity

Convergent Validity

Conclusion Validity

– The conclusions drawn from the results are justified.

Page 32: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

Is NER really solved?

Content Validity

External Validity

Convergent Validity

Conclusion Validity

“There is not enough evidence to support the statement that

NER is solved: it rather suggests the opposite”

Page 33: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

What about..

genes and diseases?

entities identified by the same name as their classes?

(ambulance, airplane, and so on)

entities identified by their attributes and description

entities…

Page 34: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Conclusions

What is an entity?

Page 35: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

My PhD thesis

How is it related to my PhD topic?

Page 36: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

My PhD thesis

How is it related to my PhD topic?

Entity Linking is the identification and disambiguation of

entities using a background knowledge base.

Entity Recognition is the first step.

What is an entity?

And more.. What is an entity in different domains?

Page 37: Reading Group 2013 (DERI NUIG)

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

References

(Balog et al 2009)

Balog, Krisztian, et al. “Overview of the TREC 2009 Entity

Track.” 2009

(Doddington et al 2004)

Doddington, George, et al. “The automatic content

extraction (ACE) program-tasks, data, and evaluation.”

Proceedings of LREC. Vol 4. 2004.

(De Vries et al 2007)

De Vries, Arjen P., et al. “Overview of the INEX 2007 entity

ranking track.” Focused Access to XML Documents.

Springer Berlin Heidelberg, 2008. 245-251.


Recommended