Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | nickolas-sanders |
View: | 217 times |
Download: | 0 times |
Named Entity Disambiguation on an Ontology Enriched by
Wikipedia
Hien Thanh Nguyen1, Tru Hoang Cao2
1Ton Duc Thang University, Vietnam2Ho Chi Minh City University of Technology, Vietnam
International IEEE Conference - RIVF’08
2
Outline
• Introduction
• Background
• Approach
• Evaluation
• Conclusion
3
Introduction
• No explicit semantic information about data and objects are presented in most of the Web pages.
• Semantic Web aim at solving this problem by making semantic metadata available in web page content– Ex: the entity “John McCarthy” pointing to the
homepage of the inventor of Lisp programming
– Entity disambiguation
4
Introduction- Entity disambiguation
• Entity disambiguation is the process of identifying when different references correspond to the same real world entity (Jorge Cardoso and Amit Sheth)
• Our work aim at detecting named entities in a text and linking them to a given ontology
5
Introduction - What are Named Entities?
• Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc.
• Example
“Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”
6
Introduction – Basic problem in NE
• Many NEs share the same name– Ambiguity of NE types: John Smith
(company vs. person) – May (person vs. month) – Washington (person vs. location) – etc.
– Ambiguity of referent (e.g. Paris may be the capital of French, or a small town in Texas)
7
Introduction - Our contribution are two-fold
• Utilizing ontological concepts, and properties of instances in a specific KB, to automatically generate a corpus of labeled training data
• Exploiting Wikipedia to enrich the training data with new and informative features.
• Exploring a range of features extracted from texts, a KB, and Wikipedia
8
Background - Ontology
• Ontology schema defines taxonomy of classes and properties (relations and attributes)
• Knowledge base contains semantic descriptions, including attributes and relations, of named entities in real world
9
Background - Wikipedia
• Each article defines an entity or a concept
• Four sources of information– Title– Redirect titles– Categories– Hyperlinks
• Outlinks vs. Inlinks
10
Background - Wikipedia
11
Approach
• Expoiting terms (i.e. base noun phrases) and named entities coocurring with ambiguous name for disambiguation
• Casting the problem as ranking problem– Using TFIDF to calculate similarity and
choose the candidate with the highest score
12
Approach
• Constructing corpus– Utilizing classes and properties to generate a
snippet for each instance in an ontology– Feature generation for enriching
representation of those instances
• Analyzing a text for disambiguation and identification of NEs occurring therein
13
Approach - Construct corpus
14
Approach- Construct corpus
15
Approach – Disambiguation process
• For each ambiguous name– Looking up candidates– Extracting base noun phrases in the same
sentence an in the headline– Extracting named entities in the whole text– Using TFIDF to rank and choose the
candidate with the highest score
16
Approach – An example
17
Evaluation
• Using KIM Ontology• 140 texts of news articles in some news
agencies• Focusing on four names: John McCarthy,
John Wiliams, Georgia, and Columbia• Measure accuracy as the total number of
correctly assignment NEs (in text)/ontology instances divided by the total number of assignment
18
Evaluation
19
Conclusion
• Our approach is quite natural and similar to the way humans do, relying on co-occurring NEs and terms to resolve other ambiguous entities in a given context.
• Currently Wikipedia editions are available for approximately 200 languages, so our method can be used to build NE disambiguation systems for a large number of languages
• The features from Wikipedia, and NEs in the whole text are meaningful evidence for disambiguation
• In the future: detecting NEs out of the ontology, and investigating other similarity metrics
20
Thanks for your attention !