Date post: | 09-Jul-2015 |
Category: |
Engineering |
Upload: | national-inistitute-of-informatics-nii-tokyo-japann |
View: | 189 times |
Download: | 9 times |
inteSearch: An Intelligent Linked Data Information AccessFramework
Md-Mizanur Rahoman, Ryutaro Ichise
November 11, 2014
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Outline
Introduction
Background of Linked Data Information AccessProblem and Probable Solution
Proposed Retrieval Framework: inteSearch
Pre-processing of Linked DataFramework Details
Experiment
Conclusion
Md-Mizanur Rahoman, Ryutaro Ichise | 2
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Linked Data (LD)
are structured data
represent knowledge with tuples like<< Subject, Predicate, Object >>which called as RDF triples
can be represented by graph
can use SQL-like expressive query
store, as openly available,2122 datasets, 61 billionRDF triples (as of Apr. 2014)
:birthPlace :supervisor :spouse
typetype type
Property
Birth Place
Supervisor Spouse
labellabel
label
:Country :Person
CountryPerson
label labelrange domain
:barl:amnd :clra :dnld
rangedomaindomainrange
Class
typetype
Amanda
:grmn :uk :grce
Germany United
KingdomGreece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace:birthPlace:birthPlace
label label
label label label
type
typeBerlusconi Cleyra
label label
Schema/Ontology
Instances
Md-Mizanur Rahoman, Ryutaro Ichise | 3
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Information Access over LD
It require
sub-graph finding over LD graph
impose sub-stantial execution cost,if graph size get bigger
know-how of (dataset specific)vocabulary, schema, LD query(i.e., linked data semantics)
demand domain-level expertiseexpect automated tool tounderstand linked data semantics
:birthPlace :supervisor :spouse
typetype type
Property
Birth Place
Supervisor Spouse
labellabel
label
:Country :Person
CountryPerson
label labelrange domain
:barl:amnd :clra :dnld
rangedomaindomainrange
Class
typetype
Amanda
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace:birthPlace:birthPlace
label label
label label label
type
typeBerlusconi Cleyra
label label
Schema/Ontology
Instances
:dnld
:grce
Greece
Donald
:spouse
:birthPlacelabel
label
Md-Mizanur Rahoman, Ryutaro Ichise | 4
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Contemporary LD Information Access Systems
Language-Tool-Based-Systems (PowerAqua’06, TBSL’12,FREyA’11, SemSek’12, CASIA’13 etc.)
use language tools (e.g., parser, POS tagger etc.) to predict possiblesub-graphs (over LD graph)convert sub-graphs to find SPARQL query
Pivot-Point-Based-Systems (Treo’11, NLP-Reduce’07 etc.)
pick a query word (i.e., pivot point), then try to pick other query wordw.r.t. the pivot point and predict a possible sub-graph (over LD graph)convert sub-graph to find SPARQL query
Md-Mizanur Rahoman, Ryutaro Ichise | 5
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Language-Tool-Based-Systems
Problem
generate many improper parsed trees - different parser gives differentparsed trees, with different parsing tags.tag for improper semantics (e.g., miss tagging of query words, such aswhether query word “spouse” should be tagged for Object orPredicate)generate empty result or improper result - choosing incorrect sub-graph
Md-Mizanur Rahoman, Ryutaro Ichise | 6
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Pivot-Point-Based-Systems
Problem
depend heavily upon picking correct pivot point - most of the cases,systems pick NE (named entities) related pivot points first, then otherpivot pointsimpose huge cost, if pivot point need to change - one pivot point canhave multiple LD resourcesmiss contextual information attachment e.g., random choosing of pivotpoints could generate very different result
Md-Mizanur Rahoman, Ryutaro Ichise | 7
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Problem Statement & Probable Solution
Problem StatementTo LD information access, how can we find the required sub-graph(over LD graph) within minimum execution cost that
will not generate empty resultwill not miss contextual information of query
Solution
To find correct sub-graph - check maximum possible sub-graphgeneration possibilityTo achieve minimum execute cost - prepare pre-processed LD statisticswhich insight sub-graph generation possibilityTo not lose contextual information of query - adapt a sub-graphjoining technique called Progressive Joining Approach (Rahoman &Ichise’14)
Md-Mizanur Rahoman, Ryutaro Ichise | 8
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
inteSearch - Overview
Pre-processed data statistics
store LD resources in a way so that they can be picked easilystore pattern of LD resources so that they can give insight aboutpossible sub-graph
Development of framework
generate single query word based graph (called as, Basic Graph)merge all Basic Graphs to predict all possible sub-graphs (i.e., called asKeyword Graphs)rank all possible Keyword Graphs using pre-processed data statisticsgenerate SPARQL query for the best ranked Keyword Graphs
Md-Mizanur Rahoman, Ryutaro Ichise | 9
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Pre-processed data statistics
Label Extractor - extract and store label of LD resourcelv(r) = {o | ∃ < r , p, o >∈ RDF triples of dataset ∧ p ∈ rrp
rrp is resource representing Predicates e.g., label, title etc.}Pattern-wise Resource Frequency Generator - compute and storeLD resource pattern frequencysf (r) = | {< r , p, o >| ∃ < r , p, o >∈ RDF triples of dataset} |pf (r) = | {< s, r , o >| ∃ < s, r , o >∈ RDF triples of dataset} |of (r) = | {< s, p, r >| ∃ < s, p, r >∈ RDF triples of dataset} |
Md-Mizanur Rahoman, Ryutaro Ichise | 10
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Example of Pre-processed Data Statistics
Exemplary LD graph
:birthPlace :supervisor :spouse
typetype type
Property
Birth Place
Supervisor Spouse
labellabel
label
:Country :Person
CountryPerson
label labelrange domain
:barl:amnd :clra :dnld
rangedomaindomainrange
Class
typetype
Amanda
:grmn :uk :grce
Germany United
KingdomGreece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace:birthPlace:birthPlace
label label
label label label
type
typeBerlusconi Cleyra
label label
Schema/Ontology
Instances
:Country
Country
label
Class
type
Pre-processed data statistics
r lv(r) sf(r) pf (r) of (r)
:Country Country 2 ... ...
:... ... ... ... ...Md-Mizanur Rahoman, Ryutaro Ichise | 11
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Basic Graph Generator - generate the Basic Graphs
Keyword Graph Generator - merge all Basic Graphs to predict theKeyword Graphs
Ranker - rank all possible Keyword Graphs using pre-processed datastatistics
SPARQL Query Generator - generate SPARQL query for the bestranked Keyword Graphs
Md-Mizanur Rahoman, Ryutaro Ichise | 12
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise | 13
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Basic Graph Generator
Choose one of the three Basic Graphs for each query word
?o?p
k , or?o
k?s , or
k?p
?s
decided by (particular) similar LD resources (toward the query word)and their pattern frequenciese.g.,
if (particular) similar LD resources {R} andPredicate Pattern-wise Resource Frequency of a LD resource (e.g.,pf (ri )) is bigger than all Subject and Object Pattern-wise ResourceFrequencies, then we select Basic Graph
?ok
?s
weight computed by highest pattern frequencies of LD resources {R}
Md-Mizanur Rahoman, Ryutaro Ichise | 14
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise | 15
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Keyword Graph Generator
Merge all Basic Graphs in their all possible merging options byfollowing Progressive Joining Approach
e.g., merging 1st and 2nd Basic Graphs at all possible options
?o?s1k
k?p
?s 2
1st Basic Graph
2nd Basic Graph kk
?s 211
, and ?s
k?o
1
1k2
?p2
1
?ok
?s1
1 k2
?p2
1
Progressive Joining Approach - if query words with order{k1, k2, k3, ..., km}, then
join Basic Graph of k1 and Basic Graph of k2 and find aIntermediate-version Keyword Graph, thenprogressively join next Basic Graph for remaining query words andupdate Intermediate-version Keyword Graph, until there is query word
Progressive Joining Approach maintain contextual informationattachment
Md-Mizanur Rahoman, Ryutaro Ichise | 16
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Progressive Joining Approach - an Example
Intermediate-version Keyword Graph k
?p
?s1
1 ?o2
k2
1
and Next query word corresponding Basic Graph k?p
?s 3
all possible contextualy-feasible Keyword Graph
Intermediate Next BG Joining between Increase of KGVersion KG last joined BG
and next BG
k
?p
?s1
1 ?o2
k2
1 k?p
?s 3
kk
?s 32
1
?sk
?o2
1k3
?p3
2
?ok
?s2
2k3
?p3
1
kk
?s 32
1
?sk
?o2
1k3
?p3
2
?ok
?s2
2k3
?p3
1
k1
k1
k1
Md-Mizanur Rahoman, Ryutaro Ichise | 17
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise | 18
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Ranker
Rank Keyword Graphs for
Weight - minimum weight of constituent Basic GraphsDepth level - how many edges a Keyword Graph holds
Consider lower depth level Keyword Graphs with higher ranked thanhigher depth level Keyword Graphs
Md-Mizanur Rahoman, Ryutaro Ichise | 19
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Development of Framework
Md-Mizanur Rahoman, Ryutaro Ichise | 20
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
SPARQL Query Generator
Construct SPARQL query
for higher ranked Keyword Graphs, until get the first non-empty resultdirectly converted by
putting Variables in SELECT clausemerging keyword corresponding resources in UNION clause
Md-Mizanur Rahoman, Ryutaro Ichise | 21
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Experimental Setup
Question setup
Questions: Question Answering over Linked Data test question set3(QALD-3)
consist natural language questions
Dataset Total Qs QALD-3
DBpedia 99 99
Keywords: constructed manually w.r.t. word order of question words
Evaluation metrics
Recall, Precision & F1-Measure
Evaluated for
detail performance analysis, execution complexity measure, comparisonwith other systems
Md-Mizanur Rahoman, Ryutaro Ichise | 22
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Detail performance analysis
Analyzed for number of keywords each question hold
No of Qs Recall (Avg) Precision (Avg) F1 Measure (Avg)
One Keyword Group 1 1.00 1.00 1.00Two Keyword Group 45 0.90 0.96 0.92Three Keyword Group 13 0.77 0.77 0.77Four Keyword Group 8 0.75 0.75 0.75Five Keyword Group 3 1.000 1.000 1.000
0.87 0.90 0.88
Observation
according to “One/Two/Three” Keyword Group questions, selection ofBasic Graph works wellaccording to more-than-one Keyword Group questions, merging-basedKeyword Graph construction and ranking works wellpre-processed data statistics helps in efficient sub-graph finding overlinked data graph
Md-Mizanur Rahoman, Ryutaro Ichise | 23
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Execution time wise performance analysis
Environment
Machine: Intel R©CoreTMi7-4770K central processing unit (CPU) 3.50GHz based system with 16 GB memory.Triple Store: Network-connected Virtuoso (version 06.01.3127)
One Two Three Four FiveKeyword Keyword Keyword Keyword KeywordGroup Group Group Group Group
710 (ms) 2441 (ms) 2774 (ms) 3585 (ms) 3720 (ms)
Observation
execution cost linearly increase over number of keywordspre-processed data statistics supports in faster execution
Md-Mizanur Rahoman, Ryutaro Ichise | 24
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Performance Comparison
Compared for QALD-3 challenge participant systems
# of Questions Processed Right Partially Recall Precision F1-Measure
squall2sparql 99 99 80 13 0.88 0.93 0.90CASIA 99 52 29 8 0.36 0.35 0.36
Scalewelis 99 70 32 1 0.33 0.33 0.33inteSearch 99 70 60 1 0.87 0.90 0.88
Observation: pre-processed data statistics helps in efficientsub-graph finding over linked data graph
Md-Mizanur Rahoman, Ryutaro Ichise | 25
Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Conclusion
IA over LD require finding proper sub-graph over LD graph
We contributed devising LD IA framework that
does not generate empty resultmaintain contextual information attachmentretrieve rich information with low execution cost
Single query word based Basic Graph can be extended for multiplequery words, that can increase further efficiency
Md-Mizanur Rahoman, Ryutaro Ichise | 26
Questions?
Md-Mizanur Rahoman, [email protected] Ichise, [email protected]
Md-Mizanur Rahoman, Ryutaro Ichise | 27