Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | laurence-clarke |
View: | 217 times |
Download: | 0 times |
Automatic Extraction Automatic Extraction and Incorporation of and Incorporation of
Purpose Data into Purpose Data into PurposeNetPurposeNet
P. Kiran MayeeP. Kiran Mayee
Rajeev SangalRajeev Sangal
Soma PaulSoma Paul
SCONLI3 JNU NEW DELHI
INTRODUCTION INTRODUCTION
PurposePurpose
Need for a knowledge base of objects Need for a knowledge base of objects
and actions in which the knowledge and actions in which the knowledge
is organized around purpose. is organized around purpose.
PurposeNetPurposeNet
PurposeNet is an intelligent PurposeNet is an intelligent knowledge-based system dealing knowledge-based system dealing with specialized attributes of artifacts with specialized attributes of artifacts – namely, their purpose, purpose of – namely, their purpose, purpose of their types, components, their types, components, accessories, as also data about their accessories, as also data about their birth, processes, side-effects, birth, processes, side-effects, maintenance and result on maintenance and result on destruction. destruction.
PurposeNetPurposeNet
Building the PurposeNetBuilding the PurposeNet
Template DesigningTemplate Designing Revision & Refinement of templateRevision & Refinement of template Selection of DomainSelection of Domain Information Retrieval from WebInformation Retrieval from Web Ontology populationOntology population TestingTesting
Need for AutomationNeed for Automation
Acquisition bottleneckAcquisition bottleneck Massive availability of textMassive availability of text Availability of purpose cuesAvailability of purpose cues
Purpose data requiredPurpose data required
Artifact -- garage Artifact -- garage
Purpose Purpose
Action -- storeAction -- store
Upon -- vehicleUpon -- vehicle
Purpose CuesPurpose Cues
Word(s)Word(s) Lexical entities in a particular orderLexical entities in a particular order Classification Classification
Sentences beginning with artifact nameSentences beginning with artifact name Sentences ending with artifact nameSentences ending with artifact name Sentence containing artifact nameSentence containing artifact name Hidden CuesHidden Cues
Sentences commencing with Sentences commencing with artifact nameartifact name
Sentences ending with Sentences ending with artifact nameartifact name
We cut trees with an axe.
action upon artifact
Sentences containing Sentences containing artifact nameartifact name
Use the air+pump to fill the tyre.
Use the <artifact> to <action> the <upon>
Methodology for purpose Methodology for purpose data extractiondata extraction
Algorithm for Purpose Data Algorithm for Purpose Data ExtractionExtraction
Algorithm PurpDataExtract(corpus)
Step1 : Read first sentence in Corpus. Step2 : Loop until end-of-corpus – 2a. if contains(sentence, artifact) and match( sentence, cuetable) then extract(sentence, artifact) extract(sentence, to_action) extract(sentence, to_upon) add_to_ontology(artifact, to_action, to_upon) else 2b. goto step 3. Step3 : Read next sentence
DataData
Wikipedia – 249 files Wikipedia – 249 files
Wordnet – 81,837 descriptionsWordnet – 81,837 descriptions
Princeton noun-artifact corpus – Princeton noun-artifact corpus –
82,115 sentences82,115 sentences
Observations – summary Observations – summary resultsresults
Corpus Name Corpus size purpsen PurpData Density (%)Wordnet 81837 1251 1.53Princeton 82115 1023 1.25Wikipedia 243 109 44.86
Purpose Data Extraction Purpose Data Extraction MissesMisses
Corpus Name PurpHits Purpmiss ( artifact name absent ) Purpmiss ( action_upon absent )Wordnet 1251 nil 4Princeton 1023 41 17Wikipedia 109 44 3
IE Metrics for ExtractionIE Metrics for Extraction
Corpus Name Precision F-measureWordnet 99.6 99.79Princeton 94.6 97.22Wikipedia 69.8 82.21
Result BreakUp per Cue Result BreakUp per Cue ClassClass
Corpus NameWordnet 70.19 0.01 24.7Princeton 71.4 1.21 21.22Wikipedia 84.2 1.6 12.21
Class1(begin cue)
Class2(ending cue)
Class3(embedded cue)
Comparison with manually Comparison with manually built Ontologybuilt Ontology
Exponential increase in speedExponential increase in speed
High Error RateHigh Error Rate
IssuesIssues
RedundancyRedundancy
Primary purpose not always obtainedPrimary purpose not always obtained
Pronouns and brand namesPronouns and brand names
Correctness and consistency not Correctness and consistency not
guaranteedguaranteed
One-to-one mapping assumedOne-to-one mapping assumed
Other sentence manifestationsOther sentence manifestations
Further EnhancementsFurther Enhancements
Parsed inputParsed input
Cues for hidden caseCues for hidden case
Better artifact lookup listBetter artifact lookup list
Multipage lookup for consistencyMultipage lookup for consistency
Cloud computingCloud computing
Automating other attributes of PurposeNetAutomating other attributes of PurposeNet
ConclusionsConclusions
A methodology was proposed for A methodology was proposed for automated ontology population of automated ontology population of purposenetpurposenet
The methodology was implemented The methodology was implemented on three corporaon three corpora
The time-taken for purposenet The time-taken for purposenet 'purpose' ontology population was a 'purpose' ontology population was a fraction of that by manual methodsfraction of that by manual methods
The Error rate was found to be highThe Error rate was found to be high
Thank YouThank You