References:● Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.. Extracting Knowledge from Text with PIKES.
In ISWC Posters & Demonstrations, 2015.
● Corcoglioniti, F., Rospocher, M., Cattoni, R., Magnini, B., Serafini, L., The KnowledgeStore: a StorageFramework for Interlinking Unstructured and Structured Knowledge. International Journal on SemanticWeb and Information Systems, volume 11, 2015.
Powered By:
PIKESPIKES Is a Knowledge Extraction Suite
PIKES
In a Nutshell: a 2-phase Frame-based Approach
pikes.fbk.eu
RDF Data Model for Information Extraction
SPARQL-based Knowledge Distillation
x:Mention
x:InstanceMention
nif:beginIndex
nif:endIndex
nif:anchorOf
x:synsetx:linkedTo
x:AttributeMention
x:normalizedValue
x:TimeMention
x:norm.Value
x:NameMention
x:nercType
x:FrameMention
x:roleset
x:ParticipationMention
x:role
x:CoreferenceMention
x:Instance
rdf:type
rdfs:labelfoaf:name
x:Attribute x:Frame
x:Time
OWL time props.
x:denotes x:implies
x:coreferential
x:coreferentialConjunct
x:argumentx:frame
Assertion (graph)
x:expresses
subject/objectowl:sameAs
rdfs:seeAlsox:include
frame/arg rel.
nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#>
foaf: <http://xmlns.com/foaf/0.1/>
x: <namespace blinded>
x:Resource
dct:titledct:creatordct:created x:mentionOf
Instance layer
Mention layer
Resource layer
x:RelationMention
x:target
Graphical Rendering of Extracted Knowledge
Performances
Detecting and representing frames and frame-role relations:→ precision: 0.716→ recall: 0.494
Processing large document corpora (Simple English Wikipedia):→ 110K pages in about 507 core hours→ processing-time linearly scales with the size of the text→ 0.85 accuracy in extracting triples about DBpedia entities
Various types of SPARQL rules: instance creation, typing, naming,DBpedia linking, frame-role linking, coreference resolution
Example (Instance Creation for Argument Nominalization):
INSERT { ?m ks:denotes ?i ; ks:implies ?if ; ks:expresses ?g .
GRAPH ?g { ?i a ks:Instance . ?if a ks:Instance , ks:Frame } }
WHERE { ?m a ks:FrameMen�on ; nif:anchorOf ?a ; ks:roleset ?s .
?s a ks:ArgumentNominaliza�on . BIND (ks:mint(?m) AS ?g)
BIND (ks:mint(concat(?a, ” pred”), ?m) AS ?if)
BIND (ks:mint(?a, ?m) AS ?i) }
Post-processing: inference, smushing, redundancy elimination,compaction.
G. W. Bush Bono supporters fight HIV Africavery strong
very strongsupporters
supporters of [...] fight fight of HIV
fight [...] in Africa
their meetingMarch 2002 resulted 5 billion dollar aid
March 2002meeting
5 billion dollar aid
resulted in [...] aidG. W. Bush and Bono [...] supporters [...] their
meetingresulted
their [...] meeting
Me
nti
on
s
ks:arg. ks:arg.
ks:coreferential
ks:coreferentialks:coref.Conjunct
ks:coref.Conjunct
ks:arg.
ks:arg.
ks:arg. ks:arg.
ks:arg.
ks:pred.
ks:pred.
ks:pred. ks:pred.ks:arg. ks:arg.ks:pred. ks:pred. ks:pred.ks:arg.
ks:pred.
Phase 1 – Linguistic Feature ExtractionBy performing several standard NLP tasks, a mention-based structured representation of the input text is built, organizing all the annotations produced by NLP tools (e.g., NERC, EL, TERN, SRL) in an RDF graph of mentions (i.e., spans of text denoting some entities or facts).
dbpedia:Bonodbpedia:Bush
:meetingtime:2002-03 :result money:5B_USD :aiddbpedia:Africadbpedia:HIV:fight:supportattr:very-1r_strong-1a
ks:amount
verbnet:location
verbnet:theme
propbank:tmp
verbnet:topic
propbank:locpropbank:mnr
verbnet:agent
verbnet:beneficiary
verbnet:actor
Ins
tan
ce
s
G. W. Bush and Bono are very strong supporters of the fight of HIV in Africa. Their March 2002 meeting resulted in a 5 billion dollar aid.
Te
xt
Phase 2 – Knowledge DistillationThe mention graph is processed via SPARQL rules to distill a knowledge graph, where each node uniquely identi.es an entity of the world, event or situation, and arcs represent relations between them (e.g., the participation and role of an entity in an event).