+ All Categories
Home > Engineering > inteSearch: An Intelligent Linked Data Information Access Framework

inteSearch: An Intelligent Linked Data Information Access Framework

Date post: 09-Jul-2015
Category:
Upload: national-inistitute-of-informatics-nii-tokyo-japann
View: 189 times
Download: 9 times
Share this document with a friend
Description:
Information access over linked data requires to determine subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less e ective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing di erent heuristics but they su er on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
27
inteSearch: An Intelligent Linked Data Information Access Framework Md-Mizanur Rahoman , Ryutaro Ichise November 11, 2014
Transcript
Page 1: inteSearch: An Intelligent Linked Data Information Access Framework

inteSearch: An Intelligent Linked Data Information AccessFramework

Md-Mizanur Rahoman, Ryutaro Ichise

November 11, 2014

Page 2: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Outline

Introduction

Background of Linked Data Information AccessProblem and Probable Solution

Proposed Retrieval Framework: inteSearch

Pre-processing of Linked DataFramework Details

Experiment

Conclusion

Md-Mizanur Rahoman, Ryutaro Ichise | 2

Page 3: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Linked Data (LD)

are structured data

represent knowledge with tuples like<< Subject, Predicate, Object >>which called as RDF triples

can be represented by graph

can use SQL-like expressive query

store, as openly available,2122 datasets, 61 billionRDF triples (as of Apr. 2014)

:birthPlace :supervisor :spouse

typetype type

Property

Birth Place

Supervisor Spouse

labellabel

label

:Country :Person

CountryPerson

label labelrange domain

:barl:amnd :clra :dnld

rangedomaindomainrange

Class

typetype

Amanda

:grmn :uk :grce

Germany United

KingdomGreece

Donald

:spouse :supervisor :spouse

:birthPlace :birthPlace:birthPlace:birthPlace

label label

label label label

type

typeBerlusconi Cleyra

label label

Schema/Ontology

Instances

Md-Mizanur Rahoman, Ryutaro Ichise | 3

Page 4: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Information Access over LD

It require

sub-graph finding over LD graph

impose sub-stantial execution cost,if graph size get bigger

know-how of (dataset specific)vocabulary, schema, LD query(i.e., linked data semantics)

demand domain-level expertiseexpect automated tool tounderstand linked data semantics

:birthPlace :supervisor :spouse

typetype type

Property

Birth Place

Supervisor Spouse

labellabel

label

:Country :Person

CountryPerson

label labelrange domain

:barl:amnd :clra :dnld

rangedomaindomainrange

Class

typetype

Amanda

:grmn :uk :grce

Germany United

Kingdom

Greece

Donald

:spouse :supervisor :spouse

:birthPlace :birthPlace:birthPlace:birthPlace

label label

label label label

type

typeBerlusconi Cleyra

label label

Schema/Ontology

Instances

:dnld

:grce

Greece

Donald

:spouse

:birthPlacelabel

label

Md-Mizanur Rahoman, Ryutaro Ichise | 4

Page 5: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Contemporary LD Information Access Systems

Language-Tool-Based-Systems (PowerAqua’06, TBSL’12,FREyA’11, SemSek’12, CASIA’13 etc.)

use language tools (e.g., parser, POS tagger etc.) to predict possiblesub-graphs (over LD graph)convert sub-graphs to find SPARQL query

Pivot-Point-Based-Systems (Treo’11, NLP-Reduce’07 etc.)

pick a query word (i.e., pivot point), then try to pick other query wordw.r.t. the pivot point and predict a possible sub-graph (over LD graph)convert sub-graph to find SPARQL query

Md-Mizanur Rahoman, Ryutaro Ichise | 5

Page 6: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Language-Tool-Based-Systems

Problem

generate many improper parsed trees - different parser gives differentparsed trees, with different parsing tags.tag for improper semantics (e.g., miss tagging of query words, such aswhether query word “spouse” should be tagged for Object orPredicate)generate empty result or improper result - choosing incorrect sub-graph

Md-Mizanur Rahoman, Ryutaro Ichise | 6

Page 7: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Pivot-Point-Based-Systems

Problem

depend heavily upon picking correct pivot point - most of the cases,systems pick NE (named entities) related pivot points first, then otherpivot pointsimpose huge cost, if pivot point need to change - one pivot point canhave multiple LD resourcesmiss contextual information attachment e.g., random choosing of pivotpoints could generate very different result

Md-Mizanur Rahoman, Ryutaro Ichise | 7

Page 8: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Problem Statement & Probable Solution

Problem StatementTo LD information access, how can we find the required sub-graph(over LD graph) within minimum execution cost that

will not generate empty resultwill not miss contextual information of query

Solution

To find correct sub-graph - check maximum possible sub-graphgeneration possibilityTo achieve minimum execute cost - prepare pre-processed LD statisticswhich insight sub-graph generation possibilityTo not lose contextual information of query - adapt a sub-graphjoining technique called Progressive Joining Approach (Rahoman &Ichise’14)

Md-Mizanur Rahoman, Ryutaro Ichise | 8

Page 9: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

inteSearch - Overview

Pre-processed data statistics

store LD resources in a way so that they can be picked easilystore pattern of LD resources so that they can give insight aboutpossible sub-graph

Development of framework

generate single query word based graph (called as, Basic Graph)merge all Basic Graphs to predict all possible sub-graphs (i.e., called asKeyword Graphs)rank all possible Keyword Graphs using pre-processed data statisticsgenerate SPARQL query for the best ranked Keyword Graphs

Md-Mizanur Rahoman, Ryutaro Ichise | 9

Page 10: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Pre-processed data statistics

Label Extractor - extract and store label of LD resourcelv(r) = {o | ∃ < r , p, o >∈ RDF triples of dataset ∧ p ∈ rrp

rrp is resource representing Predicates e.g., label, title etc.}Pattern-wise Resource Frequency Generator - compute and storeLD resource pattern frequencysf (r) = | {< r , p, o >| ∃ < r , p, o >∈ RDF triples of dataset} |pf (r) = | {< s, r , o >| ∃ < s, r , o >∈ RDF triples of dataset} |of (r) = | {< s, p, r >| ∃ < s, p, r >∈ RDF triples of dataset} |

Md-Mizanur Rahoman, Ryutaro Ichise | 10

Page 11: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Example of Pre-processed Data Statistics

Exemplary LD graph

:birthPlace :supervisor :spouse

typetype type

Property

Birth Place

Supervisor Spouse

labellabel

label

:Country :Person

CountryPerson

label labelrange domain

:barl:amnd :clra :dnld

rangedomaindomainrange

Class

typetype

Amanda

:grmn :uk :grce

Germany United

KingdomGreece

Donald

:spouse :supervisor :spouse

:birthPlace :birthPlace:birthPlace:birthPlace

label label

label label label

type

typeBerlusconi Cleyra

label label

Schema/Ontology

Instances

:Country

Country

label

Class

type

Pre-processed data statistics

r lv(r) sf(r) pf (r) of (r)

:Country Country 2 ... ...

:... ... ... ... ...Md-Mizanur Rahoman, Ryutaro Ichise | 11

Page 12: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Development of Framework

Basic Graph Generator - generate the Basic Graphs

Keyword Graph Generator - merge all Basic Graphs to predict theKeyword Graphs

Ranker - rank all possible Keyword Graphs using pre-processed datastatistics

SPARQL Query Generator - generate SPARQL query for the bestranked Keyword Graphs

Md-Mizanur Rahoman, Ryutaro Ichise | 12

Page 13: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Development of Framework

Md-Mizanur Rahoman, Ryutaro Ichise | 13

Page 14: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Basic Graph Generator

Choose one of the three Basic Graphs for each query word

?o?p

k , or?o

k?s , or

k?p

?s

decided by (particular) similar LD resources (toward the query word)and their pattern frequenciese.g.,

if (particular) similar LD resources {R} andPredicate Pattern-wise Resource Frequency of a LD resource (e.g.,pf (ri )) is bigger than all Subject and Object Pattern-wise ResourceFrequencies, then we select Basic Graph

?ok

?s

weight computed by highest pattern frequencies of LD resources {R}

Md-Mizanur Rahoman, Ryutaro Ichise | 14

Page 15: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Development of Framework

Md-Mizanur Rahoman, Ryutaro Ichise | 15

Page 16: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Keyword Graph Generator

Merge all Basic Graphs in their all possible merging options byfollowing Progressive Joining Approach

e.g., merging 1st and 2nd Basic Graphs at all possible options

?o?s1k

k?p

?s 2

1st Basic Graph

2nd Basic Graph kk

?s 211

, and ?s

k?o

1

1k2

?p2

1

?ok

?s1

1 k2

?p2

1

Progressive Joining Approach - if query words with order{k1, k2, k3, ..., km}, then

join Basic Graph of k1 and Basic Graph of k2 and find aIntermediate-version Keyword Graph, thenprogressively join next Basic Graph for remaining query words andupdate Intermediate-version Keyword Graph, until there is query word

Progressive Joining Approach maintain contextual informationattachment

Md-Mizanur Rahoman, Ryutaro Ichise | 16

Page 17: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Progressive Joining Approach - an Example

Intermediate-version Keyword Graph k

?p

?s1

1 ?o2

k2

1

and Next query word corresponding Basic Graph k?p

?s 3

all possible contextualy-feasible Keyword Graph

Intermediate Next BG Joining between Increase of KGVersion KG last joined BG

and next BG

k

?p

?s1

1 ?o2

k2

1 k?p

?s 3

kk

?s 32

1

?sk

?o2

1k3

?p3

2

?ok

?s2

2k3

?p3

1

kk

?s 32

1

?sk

?o2

1k3

?p3

2

?ok

?s2

2k3

?p3

1

k1

k1

k1

Md-Mizanur Rahoman, Ryutaro Ichise | 17

Page 18: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Development of Framework

Md-Mizanur Rahoman, Ryutaro Ichise | 18

Page 19: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Ranker

Rank Keyword Graphs for

Weight - minimum weight of constituent Basic GraphsDepth level - how many edges a Keyword Graph holds

Consider lower depth level Keyword Graphs with higher ranked thanhigher depth level Keyword Graphs

Md-Mizanur Rahoman, Ryutaro Ichise | 19

Page 20: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Development of Framework

Md-Mizanur Rahoman, Ryutaro Ichise | 20

Page 21: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

SPARQL Query Generator

Construct SPARQL query

for higher ranked Keyword Graphs, until get the first non-empty resultdirectly converted by

putting Variables in SELECT clausemerging keyword corresponding resources in UNION clause

Md-Mizanur Rahoman, Ryutaro Ichise | 21

Page 22: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Experimental Setup

Question setup

Questions: Question Answering over Linked Data test question set3(QALD-3)

consist natural language questions

Dataset Total Qs QALD-3

DBpedia 99 99

Keywords: constructed manually w.r.t. word order of question words

Evaluation metrics

Recall, Precision & F1-Measure

Evaluated for

detail performance analysis, execution complexity measure, comparisonwith other systems

Md-Mizanur Rahoman, Ryutaro Ichise | 22

Page 23: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Detail performance analysis

Analyzed for number of keywords each question hold

No of Qs Recall (Avg) Precision (Avg) F1 Measure (Avg)

One Keyword Group 1 1.00 1.00 1.00Two Keyword Group 45 0.90 0.96 0.92Three Keyword Group 13 0.77 0.77 0.77Four Keyword Group 8 0.75 0.75 0.75Five Keyword Group 3 1.000 1.000 1.000

0.87 0.90 0.88

Observation

according to “One/Two/Three” Keyword Group questions, selection ofBasic Graph works wellaccording to more-than-one Keyword Group questions, merging-basedKeyword Graph construction and ranking works wellpre-processed data statistics helps in efficient sub-graph finding overlinked data graph

Md-Mizanur Rahoman, Ryutaro Ichise | 23

Page 24: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Execution time wise performance analysis

Environment

Machine: Intel R©CoreTMi7-4770K central processing unit (CPU) 3.50GHz based system with 16 GB memory.Triple Store: Network-connected Virtuoso (version 06.01.3127)

One Two Three Four FiveKeyword Keyword Keyword Keyword KeywordGroup Group Group Group Group

710 (ms) 2441 (ms) 2774 (ms) 3585 (ms) 3720 (ms)

Observation

execution cost linearly increase over number of keywordspre-processed data statistics supports in faster execution

Md-Mizanur Rahoman, Ryutaro Ichise | 24

Page 25: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Performance Comparison

Compared for QALD-3 challenge participant systems

# of Questions Processed Right Partially Recall Precision F1-Measure

squall2sparql 99 99 80 13 0.88 0.93 0.90CASIA 99 52 29 8 0.36 0.35 0.36

Scalewelis 99 70 32 1 0.33 0.33 0.33inteSearch 99 70 60 1 0.87 0.90 0.88

Observation: pre-processed data statistics helps in efficientsub-graph finding over linked data graph

Md-Mizanur Rahoman, Ryutaro Ichise | 25

Page 26: inteSearch: An Intelligent Linked Data Information Access Framework

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Conclusion

IA over LD require finding proper sub-graph over LD graph

We contributed devising LD IA framework that

does not generate empty resultmaintain contextual information attachmentretrieve rich information with low execution cost

Single query word based Basic Graph can be extended for multiplequery words, that can increase further efficiency

Md-Mizanur Rahoman, Ryutaro Ichise | 26

Page 27: inteSearch: An Intelligent Linked Data Information Access Framework

Questions?

Md-Mizanur Rahoman, [email protected] Ichise, [email protected]

Md-Mizanur Rahoman, Ryutaro Ichise | 27


Recommended