+ All Categories
Transcript
Page 1: PIKES - Marco Rospocher · 2015-10-22 · Processing large document corpora (Simple English Wikipedia): → 110K pages in about 507 core hours → processing-time linearly scales

References:● Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.. Extracting Knowledge from Text with PIKES.

In ISWC Posters & Demonstrations, 2015.

● Corcoglioniti, F., Rospocher, M., Cattoni, R., Magnini, B., Serafini, L., The KnowledgeStore: a StorageFramework for Interlinking Unstructured and Structured Knowledge. International Journal on SemanticWeb and Information Systems, volume 11, 2015.

Powered By:

PIKESPIKES Is a Knowledge Extraction Suite

PIKES

In a Nutshell: a 2-phase Frame-based Approach

pikes.fbk.eu

RDF Data Model for Information Extraction

SPARQL-based Knowledge Distillation

x:Mention

x:InstanceMention

nif:beginIndex

nif:endIndex

nif:anchorOf

x:synsetx:linkedTo

x:AttributeMention

x:normalizedValue

x:TimeMention

x:norm.Value

x:NameMention

x:nercType

x:FrameMention

x:roleset

x:ParticipationMention

x:role

x:CoreferenceMention

x:Instance

rdf:type

rdfs:labelfoaf:name

x:Attribute x:Frame

x:Time

OWL time props.

x:denotes x:implies

x:coreferential

x:coreferentialConjunct

x:argumentx:frame

Assertion (graph)

x:expresses

subject/objectowl:sameAs

rdfs:seeAlsox:include

frame/arg rel.

nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>

foaf: <http://xmlns.com/foaf/0.1/>

x: <namespace blinded>

x:Resource

dct:titledct:creatordct:created x:mentionOf

Instance layer

Mention layer

Resource layer

x:RelationMention

x:target

Graphical Rendering of Extracted Knowledge

Performances

Detecting and representing frames and frame-role relations:→ precision: 0.716→ recall: 0.494

Processing large document corpora (Simple English Wikipedia):→ 110K pages in about 507 core hours→ processing-time linearly scales with the size of the text→ 0.85 accuracy in extracting triples about DBpedia entities

Various types of SPARQL rules: instance creation, typing, naming,DBpedia linking, frame-role linking, coreference resolution

Example (Instance Creation for Argument Nominalization):

INSERT { ?m ks:denotes ?i ; ks:implies ?if ; ks:expresses ?g .

GRAPH ?g { ?i a ks:Instance . ?if a ks:Instance , ks:Frame } }

WHERE { ?m a ks:FrameMen�on ; nif:anchorOf ?a ; ks:roleset ?s .

?s a ks:ArgumentNominaliza�on . BIND (ks:mint(?m) AS ?g)

BIND (ks:mint(concat(?a, ” pred”), ?m) AS ?if)

BIND (ks:mint(?a, ?m) AS ?i) }

Post-processing: inference, smushing, redundancy elimination,compaction.

G. W. Bush Bono supporters fight HIV Africavery strong

very strongsupporters

supporters of [...] fight fight of HIV

fight [...] in Africa

their meetingMarch 2002 resulted 5 billion dollar aid

March 2002meeting

5 billion dollar aid

resulted in [...] aidG. W. Bush and Bono [...] supporters [...] their

meetingresulted

their [...] meeting

Me

nti

on

s

ks:arg. ks:arg.

ks:coreferential

ks:coreferentialks:coref.Conjunct

ks:coref.Conjunct

ks:arg.

ks:arg.

ks:arg. ks:arg.

ks:arg.

ks:pred.

ks:pred.

ks:pred. ks:pred.ks:arg. ks:arg.ks:pred. ks:pred. ks:pred.ks:arg.

ks:pred.

Phase 1 – Linguistic Feature ExtractionBy performing several standard NLP tasks, a mention-based structured representation of the input text is built, organizing all the annotations produced by NLP tools (e.g., NERC, EL, TERN, SRL) in an RDF graph of mentions (i.e., spans of text denoting some entities or facts).

dbpedia:Bonodbpedia:Bush

:meetingtime:2002-03 :result money:5B_USD :aiddbpedia:Africadbpedia:HIV:fight:supportattr:very-1r_strong-1a

ks:amount

verbnet:location

verbnet:theme

propbank:tmp

verbnet:topic

propbank:locpropbank:mnr

verbnet:agent

verbnet:beneficiary

verbnet:actor

Ins

tan

ce

s

G. W. Bush and Bono are very strong supporters of the fight of HIV in Africa. Their March 2002 meeting resulted in a 5 billion dollar aid.

Te

xt

Phase 2 – Knowledge DistillationThe mention graph is processed via SPARQL rules to distill a knowledge graph, where each node uniquely identi.es an entity of the world, event or situation, and arcs represent relations between them (e.g., the participation and role of an entity in an event).

Top Related