+ All Categories
Home > Documents > Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona...

Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona...

Date post: 17-Jan-2016
Category:
Upload: horace-york
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015
Transcript
Page 1: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology based Information Extraction

Jin MaoPostdoc, School of Information, University of Arizona

Oct. 9th, 2015

Page 2: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Outline

Page 3: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Information Extraction

The process of obtaining pertinent information (facts) from documents. Examples: The forest area in India extended to about 75 million

hectares, which in terms of geographical area is approximately 22 percent of the total land.

What’s the relationship between forest area and geographical area?

Page 4: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology Based Information Extraction (OBIE)

Ontology Based Information Extraction(Wimalasuriya and Dou, 2010)

Ontology-driven Information Extraction(Yildiz and Miksch, 2007) The same as Ontology Based Information Extraction Whether the ontology part is within the system (Yildiz and

Miksch, 2007)

TerminologyTerminology

Page 5: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology Based Information Extraction (OBIE)

Process unstructured or semi-structured natural language text

Present the output using ontologies Ontology as input(Li and Bontcheva, 2007), released

Use an IE process guided by an ontology no new IE method an existing one is oriented to identify the components of an

ontology (classes, properties and instances) Extractors belong to an ontology? linguistic rules

Key CharacteristicsKey Characteristics

Page 6: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology Based Information Extraction (OBIE)

An ontology helps to clarify a domain’s semantics. E.g., concepts and their relationships

To alleviate a wide variety of natural language ambiguities

WhyWhy

Page 7: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology Based Information Extraction (OBIE)

Business Intelligence (BI) in e-business

Social Media—twitter

Metadata Generation for digital resources.

……

ApplicationsApplications

Page 8: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Common Architectures

Information Extraction: Identify instances from the ontology in the text. Classes, Instances, Mentions, Properties, Property Values Free texts in natural language.

Example 1: Classical fried egg Mycoplasma-type colonies were not observed on 1% agar medium.

Example 2: The cells are not motile, are not lysed in 1% SDS (wt/vol), and stain Gram positively.

Major ChallengesMajor Challenges

Page 9: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Common Architectures

Ontology Enhancement / UpdatingUpgrade the ontology with new instances to cover the knowledge

better in a domainNot in the common architecture.

Major ChallengesMajor Challenges

Page 10: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Common Architectures

General ArchitectureGeneral Architecture

Page 11: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Common Architectures

First StepFirst Step

Define the semantic elements to be extracted An example (Muller et al., 2004) Concept (C): named entities about every parts of human body

such as heart,lung, kidney… Name of Disease (N): words or phrases of disease names. Description (D): any words or phrases that describe Concepts.

“Description”refers to any kind of words or phrases that relates semantically to Concepts.

Pair of Concept and Description (P): all possible combinations of Concepts and Descriptions. Combinations contain full meaning of relationships between C and D.

Page 12: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Information Extraction Methods

Using regular expressions/patterns (watched|seen) <NP> Part-of-Speech Tag

Implemented using finite-state transducers which consist of a series of finite-state automata Automatically generate regular rules: “[Ii]nteract(s|ed|

ing)?”“interact,” “interacts,” “interacted,” “interacting,” ”Interact,” “Interacts,” “Interacted,” and “Interacting.”

Simple, surprisingly good results

Linguistic rulesLinguistic rules

Page 13: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Information Extraction Methods

automatically mine extraction rules from text

A dictionary inductive learning algorithm(Vargas-Vera et al., 2001)

Finding the longest common subsequence problem (Romano et

al., 2006)

Relational Learning(Califf and Mooney, 1999), a bottom-up

learning

Linguistic rulesLinguistic rules

Page 14: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Information Extraction Methods

To recognize individual words or phrases

widely used in the named-entity recognition

E.g., to recognize states of the US or countries of the world

Conditions:

Specify exactly what is being identified by the gazetteer.

Specify where the information for the gazetteer lists was obtained

from.

Gazetteer ListsGazetteer Lists

Page 15: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Information Extraction Methods

Linguistic features such as POS tags, capitalization

information and individual

Part of IE as classification problems:

whether a word token is the start/end of an entity (Li et al., 2004)

identify different components of an ontology such as instances (Li

and Bontcheva, 2007) and property values (Wu and Weld, 2007)

Classification TechniquesClassification Techniques

Page 16: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Information Extraction Methods

A semantically annotated parse tree for the text as a part

of the IE process

Linguistic extraction rules with partial parse trees

(Todirascu et al., 2002).

Syntax/Shallow NLPSyntax/Shallow NLP

Page 17: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology Construction

to consider the ontology as an input to the system

to construct an ontology as a part of the OBIE process

Page 18: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Ontology Enhancement

update the ontology by adding new classes and properties through the IE process. NOT instances and their property values Such systems include the implementations by Maedche et al.

(2003) and Dung and Kameyama (2007). Fuzzy Relationship Rule: Define rules according to the

relationships among semantic elements.

o Generate a suggestion list for the domain experts to extract real semantic elements.

Page 19: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Performance Evaluation

Measure the accuracy of identifying instances and property values.

Most IE systems face a trade-off between improving precision and recall.

when β2<1, p should be more important

Page 20: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Performance Evaluation

Evaluation in different scales (Maynard et al., 2004) each answer is categorized as correct or incorrect, however,

different degrees of correctness should be allowed. Learning Accuracy (LA) : This measures the closeness of the

assigned class label to the correct class label based on the hierarchy of the ontology (Cimiano et al., 2005).

Multi-dimensional evaluation beyond Precision and Recall

Page 21: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Performance Evaluation

Cost-based metrics(Maynard et al., 2004)

cost would typically be associated with a miss and a false alarm

(spurious answer)

augmented precision (AP)

augmented recall (AR)

Page 22: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Potentials

Automatically processing the information contained in

natural language text

Creating semantic contents for the Semantic Web

automatic metadata generation

semantic annotation

Improving the quality of ontologies

Page 23: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

ACKNOWLEDGEMENT

Most of the materials are adapted from:

Wimalasuriya, D. C., & Dou, D. (2010). Ontology-based information extraction: An introduction

and a survey of current approaches. Journal of Information Science.

Other References (part):•Muhammad, A., & Dey, L. (2005). Biological Ontology enhancement with Fuzzy Relation: A Text Mining Framework. In International Conference on Web Intelligence WI (Vol. 5). •R. Romano, L. Rokach and O. Maimon, Automatic discovery of regular expression patterns representing negated findings in medical narrative reports. In: Proceedings of the 6th International Workshop on Next Generation Information Technologies and Systems (Springer, Berlin, 2006).•Muller, H. M., Kenny, E. E., & Sternberg, P. W. (2004). Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2(11), e309.•Dung, T. Q., & Kameyama, W. (2007). Ontology-based information extraction and information retrieval in health care domain. In Data Warehousing and Knowledge Discovery (pp. 323-333). Springer Berlin Heidelberg.

Page 24: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015.

Thank you!


Recommended