+ All Categories
Home > Documents > Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao...

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao...

Date post: 17-Jan-2016
Category:
Upload: nickolas-sanders
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1 , Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh City University of Technology, Vietnam International IEEE Conference - RIVF’08
Transcript
Page 1: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Named Entity Disambiguation on an Ontology Enriched by

Wikipedia

Hien Thanh Nguyen1, Tru Hoang Cao2

1Ton Duc Thang University, Vietnam2Ho Chi Minh City University of Technology, Vietnam

International IEEE Conference - RIVF’08

Page 2: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

2

Outline

• Introduction

• Background

• Approach

• Evaluation

• Conclusion

Page 3: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

3

Introduction

• No explicit semantic information about data and objects are presented in most of the Web pages.

• Semantic Web aim at solving this problem by making semantic metadata available in web page content– Ex: the entity “John McCarthy” pointing to the

homepage of the inventor of Lisp programming

– Entity disambiguation

Page 4: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

4

Introduction- Entity disambiguation

• Entity disambiguation is the process of identifying when different references correspond to the same real world entity (Jorge Cardoso and Amit Sheth)

• Our work aim at detecting named entities in a text and linking them to a given ontology

Page 5: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

5

Introduction - What are Named Entities?

• Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc.

• Example

“Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”

Page 6: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

6

Introduction – Basic problem in NE

• Many NEs share the same name– Ambiguity of NE types: John Smith

(company vs. person) – May (person vs. month) – Washington (person vs. location) – etc.

– Ambiguity of referent (e.g. Paris may be the capital of French, or a small town in Texas)

Page 7: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

7

Introduction - Our contribution are two-fold

• Utilizing ontological concepts, and properties of instances in a specific KB, to automatically generate a corpus of labeled training data

• Exploiting Wikipedia to enrich the training data with new and informative features.

• Exploring a range of features extracted from texts, a KB, and Wikipedia

Page 8: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

8

Background - Ontology

• Ontology schema defines taxonomy of classes and properties (relations and attributes)

• Knowledge base contains semantic descriptions, including attributes and relations, of named entities in real world

Page 9: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

9

Background - Wikipedia

• Each article defines an entity or a concept

• Four sources of information– Title– Redirect titles– Categories– Hyperlinks

• Outlinks vs. Inlinks

Page 10: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

10

Background - Wikipedia

Page 11: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

11

Approach

• Expoiting terms (i.e. base noun phrases) and named entities coocurring with ambiguous name for disambiguation

• Casting the problem as ranking problem– Using TFIDF to calculate similarity and

choose the candidate with the highest score

Page 12: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

12

Approach

• Constructing corpus– Utilizing classes and properties to generate a

snippet for each instance in an ontology– Feature generation for enriching

representation of those instances

• Analyzing a text for disambiguation and identification of NEs occurring therein

Page 13: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

13

Approach - Construct corpus

Page 14: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

14

Approach- Construct corpus

Page 15: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

15

Approach – Disambiguation process

• For each ambiguous name– Looking up candidates– Extracting base noun phrases in the same

sentence an in the headline– Extracting named entities in the whole text– Using TFIDF to rank and choose the

candidate with the highest score

Page 16: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

16

Approach – An example

Page 17: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

17

Evaluation

• Using KIM Ontology• 140 texts of news articles in some news

agencies• Focusing on four names: John McCarthy,

John Wiliams, Georgia, and Columbia• Measure accuracy as the total number of

correctly assignment NEs (in text)/ontology instances divided by the total number of assignment

Page 18: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

18

Evaluation

Page 19: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

19

Conclusion

• Our approach is quite natural and similar to the way humans do, relying on co-occurring NEs and terms to resolve other ambiguous entities in a given context.

• Currently Wikipedia editions are available for approximately 200 languages, so our method can be used to build NE disambiguation systems for a large number of languages

• The features from Wikipedia, and NEs in the whole text are meaningful evidence for disambiguation

• In the future: detecting NEs out of the ontology, and investigating other similarity metrics

Page 20: Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

20

Thanks for your attention !


Recommended