+ All Categories
Home > Documents > Ontology learning and population from from text

Ontology learning and population from from text

Date post: 24-Feb-2016
Category:
Upload: hollye
View: 36 times
Download: 0 times
Share this document with a friend
Description:
Ontology learning and population from from text. Ch8 Population. Population. Population of ontology: Finding instances of relations as well as of concepts Requires full understanding of natural language More modest target: The extraction of a set of predefined relations In this chapter: - PowerPoint PPT Presentation
Popular Tags:
30
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population
Transcript
Page 1: Ontology learning and population from from text

ONTOLOGY LEARNING AND POPULATION FROM FROM TEXTCh8 Population

Page 2: Ontology learning and population from from text

Population• Population of ontology:

• Finding instances of relations as well as of concepts• Requires full understanding of natural language

• More modest target:• The extraction of a set of predefined relations

• In this chapter:• No acquisition of instances of relations• The detection of instances of concepts

Page 3: Ontology learning and population from from text

Population• Common Approaches

• Corpus-based Population• A standard similarity-based approach

• Learning by Googling• Semi-supervised approach• PANKOW• C-PANKOW

Page 4: Ontology learning and population from from text

Common Approaches• Lexico-syntactic Patterns

• Hearst patterns

• Similarity-based Classification• Algorithm12• Data sparseness problem

• Supervised Approaches• Predict the category of a certain instance with a model• Requires thousands of training examples to train the model• Not feasible - considering hundreds of concepts as possible tags

Page 5: Ontology learning and population from from text
Page 6: Ontology learning and population from from text

Similarity-based Classification of Named Entities

• Using different similarity measures• Cosine, Jaccard, L1 norm, Jensen-Shannon, Skew

• Using different feature weighting measures • Conditional, PMI, Resnik

Page 7: Ontology learning and population from from text

Evaluation• Goal: learn a function fs

• fa and fb: specified by two annotators• Functions as sets:

• Measurement• Precision, Recall, F-measure, learning accuracy

Page 8: Ontology learning and population from from text

Experiments• Using Word Windows

• n words to the left and right of a word of interest• Excluding stopwords without trespassing sentence boundaries

• Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger's delta.

• Mopti: traditional(l), biggest(1)Niger: city(l), delta(l), view(l)Gao: San(l), ofFer(l), town(l), junction(l)

San: offer(l), view(l), Gao(l), nice(l)

Page 9: Ontology learning and population from from text

Experiments• Result:

Page 10: Ontology learning and population from from text

Experiments• Result:

Page 11: Ontology learning and population from from text

Experiments• Using Pseudo-syntactic Dependencies

• Object-attribute pair• Mopti is the biggest city along the Niger with one of the most vibrant

ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center of the local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger's delta.

• Mopti: is-city(l), has_ambience(l) Niger: has_delta(l) Gao: junction.of(l) San: offer_subj(l)

• Result:

Page 12: Ontology learning and population from from text

Experiments• Dealing with Data Sparseness

• Using Conjunctions• When two named entities linked by conjunctions

• Result:

Page 13: Ontology learning and population from from text

Experiments• Dealing with Data Sparseness

• Exploiting the Taxonomy• Compute the context vector of a certain term by considering the context

vectors of its subconcepts • Take only into account the context vectors of direct subconcepts• Normalizing aggregated vectors:

• Standard normalization of the vector• Calculating its centroid

Page 14: Ontology learning and population from from text

Experiments• Dealing with Data Sparseness

• Exploiting the Taxonomy• Result:

Page 15: Ontology learning and population from from text

Experiments• Dealing with Data Sparseness

• Anaphora Resolution• Replace each anaphoric reference to the corresponding antecedent

• The port capital of Vathy is dominated by its fortified Venetian har- bor. • The port capital of Vathy is dominated by Vathy's fortified Venetian harbor.

• Result:

Page 16: Ontology learning and population from from text

Experiments• Dealing with Data Sparseness

• Downloading Documents from the Web• Downloading 20 additional documents Di for each named entity i• keep d that its similarity is over an threshold of 0.2• Result:

Page 17: Ontology learning and population from from text

Experiments• Dealing with Data Sparseness

• Post-processing• The k best answers of the system are checked for their statistical

plausibility on the web• Result:

Page 18: Ontology learning and population from from text

PANKOW• Pattern-based Annotation through Knowledge on the Web

• Certain lexico-syntactic patterns as defined by Hearst can be matched in corpus AND World Wide Web

Page 19: Ontology learning and population from from text

PANKOW• The Process of PANKOW

• Step 1: iterates the set of entities to be classified and generates instances of patterns, one for each concept in the ontology. • For example: instance - South Africa, concepts – country and

resulting in pattern instances - ' 'South Africa is a country" and ' 'South Africa is a hotel" or "countries such as South Africa" and "hotels such as South Africa". • Result 1: A set of pattern instances• Step 2: Google is queried for the pattern instances through its Web

service API• Result 2: the counts for each pattern instance• Step 3: sums up the query results to a total for each concept. • Result: The statistical web fingerprint for each entity, that is, the

results of aggregating for each entity the number of Google counts for all pattern instances conveying the relation of interest.

Page 20: Ontology learning and population from from text

PANKOW• The Process of PANKOW

Page 21: Ontology learning and population from from text

PANKOW• Evaluation

• From the two annotators• Reference standards for subject A and B

• Measurement:• Precision, recall, and F-measure

Page 22: Ontology learning and population from from text

PANKOW• Evaluation

• Measurement:• Average the results for both annotatores

Page 23: Ontology learning and population from from text

PANKOW• Result:

Page 24: Ontology learning and population from from text

C-PANKOW• Shortcoming of PANKOW

• A lot of actual instances of the pattern schema are not found

• Large number of queries sent to the Google Web API

• Not scale to larger ontologies

Page 25: Ontology learning and population from from text

C-PANKOW• C-PANKOW Process

• the web page to be annotated is scanned for candidate instances.• for each instance i discovered and for each clue-pattern pair in our

pattern library P, an automatically generated query is issued to Google and the abstracts or snippets of the n first hits are downloaded.

• Then the similarity between the document to be annotated and the downloaded abstract is calculated. If the similarity is above a given threshold t, the actual pattern found in the abstract reveals a phrase which may possibly describe the concept that the instance belongs to in the context in question.

• The pattern matched in a certain Google abstract is only considered if the similarity between the original page and this abstract is above a given threshold. In this way the pattern-matching process is contextualized.

• Finally, the instance i is annotated with that concept c having the largest number as well as most contextually relevant hits.

Page 26: Ontology learning and population from from text

C-PANKOW• C-PANKOW Process

Page 27: Ontology learning and population from from text

C-PANKOW• Evaluation

• Same dataset and evaluation measures as PANKOW • BUT the C-PANKOW uses the 682 concepts of the pruned Tourism

ontology as possible tags • Added learning accuracy

Page 28: Ontology learning and population from from text

C-PANKOW• Result:

Page 29: Ontology learning and population from from text

C-PANKOW• Result:

Page 30: Ontology learning and population from from text

C-PANKOW• Result:


Recommended