+ All Categories
Home > Documents > PIKES - Marco Rospocher · 2015-10-22 · Processing large document corpora (Simple English...

PIKES - Marco Rospocher · 2015-10-22 · Processing large document corpora (Simple English...

Date post: 08-Jul-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
1
References: Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.. Extracting Knowledge from Text with PIKES. In ISWC Posters & Demonstrations, 2015. Corcoglioniti, F., Rospocher, M., Cattoni, R., Magnini, B., Serafini, L., The KnowledgeStore: a Storage Framework for Interlinking Unstructured and Structured Knowledge . International Journal on Semantic Web and Information Systems, volume 11, 2015. Powered By: PIKES PIKES Is a Knowledge Extraction Suite PIKES In a Nutshell: a 2-phase Frame-based Approach pikes.fbk.eu RDF Data Model for Information Extraction SPARQL-based Knowledge Distillation x:Mention x:InstanceMention nif:beginIndex nif:endIndex nif:anchorOf x:synset x:linkedTo x:AttributeMention x:normalizedValue x:TimeMention x:norm.Value x:NameMention x:nercType x:FrameMention x:roleset x:ParticipationMention x:role x:CoreferenceMention x:Instance rdf:type rdfs:label foaf:name x:Attribute x:Frame x:Time OWL time props. x:denotes x:implies x:coreferential x:coreferentialConjunct x:argument x:frame Assertion (graph) x:expresses subject/object owl:sameAs rdfs:seeAlso x:include frame/arg rel. nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> foaf: <http://xmlns.com/foaf/0.1/> x: <namespace blinded> x:Resource dct:title dct:creator dct:created x:mentionOf Instance layer Mention layer Resource layer x:RelationMention x:target Graphical Rendering of Extracted Knowledge Performances Detecting and representing frames and frame-role relations: → precision: 0.716 → recall: 0.494 Processing large document corpora (Simple English Wikipedia): 110K pages in about 507 core hours → processing-time linearly scales with the size of the text 0.85 accuracy in extracting triples about DBpedia entities Various types of SPARQL rules: instance creation, typing, naming, DBpedia linking, frame-role linking, coreference resolution Example (Instance Creation for Argument Nominalization): INSERT { ?m ks:denotes ?i ; ks:implies ?if ; ks:expresses ?g . GRAPH ?g { ?i a ks:Instance . ?if a ks:Instance , ks:Frame } } WHERE { ?m a ks:FrameMenon ; nif:anchorOf ?a ; ks:roleset ?s . ?s a ks:ArgumentNominalizaon . BIND (ks:mint(?m) AS ?g) BIND (ks:mint(concat(?a, ” pred”), ?m) AS ?if) BIND (ks:mint(?a, ?m) AS ?i) } Post-processing: inference, smushing, redundancy elimination, compaction. G. W. Bush Bono supporters fight HIV Africa very strong very strong supporters supporters of [...] fight fight of HIV fight [...] in Africa their meeting March 2002 resulted 5 billion dollar aid March 2002 meeting 5 billion dollar aid resulted in [...] aid G. W. Bush and Bono [...] supporters [...] their meeting resulted their [...] meeting Mentions ks:arg. ks:arg. ks:coreferential ks:coreferential ks:coref.Conjunct ks:coref.Conjunct ks:arg. ks:arg. ks:arg. ks:arg. ks:arg. ks:pred. ks:pred. ks:pred. ks:pred. ks:arg. ks:arg. ks:pred. ks:pred. ks:pred. ks:arg. ks:pred. Phase 1 – Linguistic Feature Extraction By performing several standard NLP tasks, a mention-based structured representation of the input text is built, organizing all the annotations produced by NLP tools (e.g., NERC, EL, TERN, SRL) in an RDF graph of mentions (i.e., spans of text denoting some entities or facts). dbpedia:Bono dbpedia:Bush :meeting time:2002-03 :result money:5B_USD :aid dbpedia:Africa dbpedia:HIV :fight :support attr:very-1r_strong-1a ks:amount verbnet:location verbnet:theme propbank:tmp verbnet: topic propbank:loc propbank:mnr verbnet:agent verbnet: beneficiary verbnet:actor Instances G. W. Bush and Bono are very strong supporters of the fight of HIV in Africa. Their March 2002 meeting resulted in a 5 billion dollar aid. Text Phase 2 – Knowledge Distillation The mention graph is processed via SPARQL rules to distill a knowledge graph, where each node uniquely identi.es an entity of the world, event or situation, and arcs represent relations between them (e.g., the participation and role of an entity in an event).
Transcript
Page 1: PIKES - Marco Rospocher · 2015-10-22 · Processing large document corpora (Simple English Wikipedia): → 110K pages in about 507 core hours → processing-time linearly scales

References:● Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.. Extracting Knowledge from Text with PIKES.

In ISWC Posters & Demonstrations, 2015.

● Corcoglioniti, F., Rospocher, M., Cattoni, R., Magnini, B., Serafini, L., The KnowledgeStore: a StorageFramework for Interlinking Unstructured and Structured Knowledge. International Journal on SemanticWeb and Information Systems, volume 11, 2015.

Powered By:

PIKESPIKES Is a Knowledge Extraction Suite

PIKES

In a Nutshell: a 2-phase Frame-based Approach

pikes.fbk.eu

RDF Data Model for Information Extraction

SPARQL-based Knowledge Distillation

x:Mention

x:InstanceMention

nif:beginIndex

nif:endIndex

nif:anchorOf

x:synsetx:linkedTo

x:AttributeMention

x:normalizedValue

x:TimeMention

x:norm.Value

x:NameMention

x:nercType

x:FrameMention

x:roleset

x:ParticipationMention

x:role

x:CoreferenceMention

x:Instance

rdf:type

rdfs:labelfoaf:name

x:Attribute x:Frame

x:Time

OWL time props.

x:denotes x:implies

x:coreferential

x:coreferentialConjunct

x:argumentx:frame

Assertion (graph)

x:expresses

subject/objectowl:sameAs

rdfs:seeAlsox:include

frame/arg rel.

nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>

foaf: <http://xmlns.com/foaf/0.1/>

x: <namespace blinded>

x:Resource

dct:titledct:creatordct:created x:mentionOf

Instance layer

Mention layer

Resource layer

x:RelationMention

x:target

Graphical Rendering of Extracted Knowledge

Performances

Detecting and representing frames and frame-role relations:→ precision: 0.716→ recall: 0.494

Processing large document corpora (Simple English Wikipedia):→ 110K pages in about 507 core hours→ processing-time linearly scales with the size of the text→ 0.85 accuracy in extracting triples about DBpedia entities

Various types of SPARQL rules: instance creation, typing, naming,DBpedia linking, frame-role linking, coreference resolution

Example (Instance Creation for Argument Nominalization):

INSERT { ?m ks:denotes ?i ; ks:implies ?if ; ks:expresses ?g .

GRAPH ?g { ?i a ks:Instance . ?if a ks:Instance , ks:Frame } }

WHERE { ?m a ks:FrameMen�on ; nif:anchorOf ?a ; ks:roleset ?s .

?s a ks:ArgumentNominaliza�on . BIND (ks:mint(?m) AS ?g)

BIND (ks:mint(concat(?a, ” pred”), ?m) AS ?if)

BIND (ks:mint(?a, ?m) AS ?i) }

Post-processing: inference, smushing, redundancy elimination,compaction.

G. W. Bush Bono supporters fight HIV Africavery strong

very strongsupporters

supporters of [...] fight fight of HIV

fight [...] in Africa

their meetingMarch 2002 resulted 5 billion dollar aid

March 2002meeting

5 billion dollar aid

resulted in [...] aidG. W. Bush and Bono [...] supporters [...] their

meetingresulted

their [...] meeting

Me

nti

on

s

ks:arg. ks:arg.

ks:coreferential

ks:coreferentialks:coref.Conjunct

ks:coref.Conjunct

ks:arg.

ks:arg.

ks:arg. ks:arg.

ks:arg.

ks:pred.

ks:pred.

ks:pred. ks:pred.ks:arg. ks:arg.ks:pred. ks:pred. ks:pred.ks:arg.

ks:pred.

Phase 1 – Linguistic Feature ExtractionBy performing several standard NLP tasks, a mention-based structured representation of the input text is built, organizing all the annotations produced by NLP tools (e.g., NERC, EL, TERN, SRL) in an RDF graph of mentions (i.e., spans of text denoting some entities or facts).

dbpedia:Bonodbpedia:Bush

:meetingtime:2002-03 :result money:5B_USD :aiddbpedia:Africadbpedia:HIV:fight:supportattr:very-1r_strong-1a

ks:amount

verbnet:location

verbnet:theme

propbank:tmp

verbnet:topic

propbank:locpropbank:mnr

verbnet:agent

verbnet:beneficiary

verbnet:actor

Ins

tan

ce

s

G. W. Bush and Bono are very strong supporters of the fight of HIV in Africa. Their March 2002 meeting resulted in a 5 billion dollar aid.

Te

xt

Phase 2 – Knowledge DistillationThe mention graph is processed via SPARQL rules to distill a knowledge graph, where each node uniquely identi.es an entity of the world, event or situation, and arcs represent relations between them (e.g., the participation and role of an entity in an event).

Recommended