inteSearch: An Intelligent Linked Data Information Access Framework

inteSearch: An Intelligent Linked Data Information AccessFramework

Md-Mizanur Rahoman, Ryutaro Ichise

November 11, 2014

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion

Outline

Introduction

Background of Linked Data Information AccessProblem and Probable Solution

Proposed Retrieval Framework: inteSearch

Pre-processing of Linked DataFramework Details

Experiment

Conclusion

Md-Mizanur Rahoman, Ryutaro Ichise | 2


Linked Data (LD)

are structured data

represent knowledge with tuples like<< Subject, Predicate, Object >>which called as RDF triples

can be represented by graph

can use SQL-like expressive query

store, as openly available,2122 datasets, 61 billionRDF triples (as of Apr. 2014)

:birthPlace :supervisor :spouse

typetype type

Property

Birth Place

Supervisor Spouse

labellabel

label

:Country :Person

CountryPerson

label labelrange domain

:barl:amnd :clra :dnld

rangedomaindomainrange

Class

typetype

Amanda

:grmn :uk :grce

Germany United

KingdomGreece

Donald

:spouse :supervisor :spouse

:birthPlace :birthPlace:birthPlace:birthPlace

label label

label label label

type

typeBerlusconi Cleyra

label label

Schema/Ontology

Instances



Information Access over LD

It require

sub-graph finding over LD graph

impose sub-stantial execution cost,if graph size get bigger

know-how of (dataset specific)vocabulary, schema, LD query(i.e., linked data semantics)

demand domain-level expertiseexpect automated tool tounderstand linked data semantics


typetype type

Property

Birth Place

Supervisor Spouse

labellabel

label

:Country :Person

CountryPerson




Class

typetype

Amanda

:grmn :uk :grce

Germany United

Kingdom

Greece

Donald



label label

label label label

type


label label

Schema/Ontology

Instances

:dnld

:grce

Greece

Donald

:spouse

:birthPlacelabel

label



Contemporary LD Information Access Systems

Language-Tool-Based-Systems (PowerAqua’06, TBSL’12,FREyA’11, SemSek’12, CASIA’13 etc.)

use language tools (e.g., parser, POS tagger etc.) to predict possiblesub-graphs (over LD graph)convert sub-graphs to find SPARQL query

Pivot-Point-Based-Systems (Treo’11, NLP-Reduce’07 etc.)

pick a query word (i.e., pivot point), then try to pick other query wordw.r.t. the pivot point and predict a possible sub-graph (over LD graph)convert sub-graph to find SPARQL query



Language-Tool-Based-Systems

Problem

generate many improper parsed trees - different parser gives differentparsed trees, with different parsing tags.tag for improper semantics (e.g., miss tagging of query words, such aswhether query word “spouse” should be tagged for Object orPredicate)generate empty result or improper result - choosing incorrect sub-graph



Pivot-Point-Based-Systems

Problem

depend heavily upon picking correct pivot point - most of the cases,systems pick NE (named entities) related pivot points first, then otherpivot pointsimpose huge cost, if pivot point need to change - one pivot point canhave multiple LD resourcesmiss contextual information attachment e.g., random choosing of pivotpoints could generate very different result



Problem Statement & Probable Solution

Problem StatementTo LD information access, how can we find the required sub-graph(over LD graph) within minimum execution cost that

will not generate empty resultwill not miss contextual information of query

Solution

To find correct sub-graph - check maximum possible sub-graphgeneration possibilityTo achieve minimum execute cost - prepare pre-processed LD statisticswhich insight sub-graph generation possibilityTo not lose contextual information of query - adapt a sub-graphjoining technique called Progressive Joining Approach (Rahoman &Ichise’14)



inteSearch - Overview

Pre-processed data statistics

store LD resources in a way so that they can be picked easilystore pattern of LD resources so that they can give insight aboutpossible sub-graph

Development of framework

generate single query word based graph (called as, Basic Graph)merge all Basic Graphs to predict all possible sub-graphs (i.e., called asKeyword Graphs)rank all possible Keyword Graphs using pre-processed data statisticsgenerate SPARQL query for the best ranked Keyword Graphs




Label Extractor - extract and store label of LD resourcelv(r) = {o | ∃ < r , p, o >∈ RDF triples of dataset ∧ p ∈ rrp

rrp is resource representing Predicates e.g., label, title etc.}Pattern-wise Resource Frequency Generator - compute and storeLD resource pattern frequencysf (r) = | {< r , p, o >| ∃ < r , p, o >∈ RDF triples of dataset} |pf (r) = | {< s, r , o >| ∃ < s, r , o >∈ RDF triples of dataset} |of (r) = | {< s, p, r >| ∃ < s, p, r >∈ RDF triples of dataset} |



Example of Pre-processed Data Statistics

Exemplary LD graph


typetype type

Property

Birth Place

Supervisor Spouse

labellabel

label

:Country :Person

CountryPerson




Class

typetype

Amanda

:grmn :uk :grce

Germany United

KingdomGreece

Donald



label label

label label label

type


label label

Schema/Ontology

Instances

:Country

Country

label

Class

type


r lv(r) sf(r) pf (r) of (r)

:Country Country 2 ... ...

:... ... ... ... ...Md-Mizanur Rahoman, Ryutaro Ichise | 11


Development of Framework

Basic Graph Generator - generate the Basic Graphs

Keyword Graph Generator - merge all Basic Graphs to predict theKeyword Graphs

Ranker - rank all possible Keyword Graphs using pre-processed datastatistics

SPARQL Query Generator - generate SPARQL query for the bestranked Keyword Graphs






Basic Graph Generator

Choose one of the three Basic Graphs for each query word

?o?p

k , or?o

k?s , or

k?p

?s

decided by (particular) similar LD resources (toward the query word)and their pattern frequenciese.g.,

if (particular) similar LD resources {R} andPredicate Pattern-wise Resource Frequency of a LD resource (e.g.,pf (ri )) is bigger than all Subject and Object Pattern-wise ResourceFrequencies, then we select Basic Graph

?ok

?s

weight computed by highest pattern frequencies of LD resources {R}






Keyword Graph Generator

Merge all Basic Graphs in their all possible merging options byfollowing Progressive Joining Approach

e.g., merging 1st and 2nd Basic Graphs at all possible options

?o?s1k

k?p

?s 2

1st Basic Graph

2nd Basic Graph kk

?s 211

, and ?s

k?o

1

1k2

?p2

1

?ok

?s1

1 k2

?p2

1

Progressive Joining Approach - if query words with order{k1, k2, k3, ..., km}, then

join Basic Graph of k1 and Basic Graph of k2 and find aIntermediate-version Keyword Graph, thenprogressively join next Basic Graph for remaining query words andupdate Intermediate-version Keyword Graph, until there is query word

Progressive Joining Approach maintain contextual informationattachment



Progressive Joining Approach - an Example

Intermediate-version Keyword Graph k

?p

?s1

1 ?o2

k2

1

and Next query word corresponding Basic Graph k?p

?s 3

all possible contextualy-feasible Keyword Graph

Intermediate Next BG Joining between Increase of KGVersion KG last joined BG

and next BG

k

?p

?s1

1 ?o2

k2

1 k?p

?s 3

kk

?s 32

1

?sk

?o2

1k3

?p3

2

?ok

?s2

2k3

?p3

1

kk

?s 32

1

?sk

?o2

1k3

?p3

2

?ok

?s2

2k3

?p3

1

k1

k1

k1






Ranker

Rank Keyword Graphs for

Weight - minimum weight of constituent Basic GraphsDepth level - how many edges a Keyword Graph holds

Consider lower depth level Keyword Graphs with higher ranked thanhigher depth level Keyword Graphs






SPARQL Query Generator

Construct SPARQL query

for higher ranked Keyword Graphs, until get the first non-empty resultdirectly converted by

putting Variables in SELECT clausemerging keyword corresponding resources in UNION clause



Experimental Setup

Question setup

Questions: Question Answering over Linked Data test question set3(QALD-3)

consist natural language questions

Dataset Total Qs QALD-3

DBpedia 99 99

Keywords: constructed manually w.r.t. word order of question words

Evaluation metrics

Recall, Precision & F1-Measure

Evaluated for

detail performance analysis, execution complexity measure, comparisonwith other systems



Detail performance analysis

Analyzed for number of keywords each question hold

No of Qs Recall (Avg) Precision (Avg) F1 Measure (Avg)

One Keyword Group 1 1.00 1.00 1.00Two Keyword Group 45 0.90 0.96 0.92Three Keyword Group 13 0.77 0.77 0.77Four Keyword Group 8 0.75 0.75 0.75Five Keyword Group 3 1.000 1.000 1.000

0.87 0.90 0.88

Observation

according to “One/Two/Three” Keyword Group questions, selection ofBasic Graph works wellaccording to more-than-one Keyword Group questions, merging-basedKeyword Graph construction and ranking works wellpre-processed data statistics helps in efficient sub-graph finding overlinked data graph



Execution time wise performance analysis

Environment

Machine: Intel R©CoreTMi7-4770K central processing unit (CPU) 3.50GHz based system with 16 GB memory.Triple Store: Network-connected Virtuoso (version 06.01.3127)

One Two Three Four FiveKeyword Keyword Keyword Keyword KeywordGroup Group Group Group Group

710 (ms) 2441 (ms) 2774 (ms) 3585 (ms) 3720 (ms)

Observation

execution cost linearly increase over number of keywordspre-processed data statistics supports in faster execution



Performance Comparison

Compared for QALD-3 challenge participant systems

# of Questions Processed Right Partially Recall Precision F1-Measure

squall2sparql 99 99 80 13 0.88 0.93 0.90CASIA 99 52 29 8 0.36 0.35 0.36

Scalewelis 99 70 32 1 0.33 0.33 0.33inteSearch 99 70 60 1 0.87 0.90 0.88

Observation: pre-processed data statistics helps in efficientsub-graph finding over linked data graph



Conclusion

IA over LD require finding proper sub-graph over LD graph

We contributed devising LD IA framework that

does not generate empty resultmaintain contextual information attachmentretrieve rich information with low execution cost

Single query word based Basic Graph can be extended for multiplequery words, that can increase further efficiency


Questions?

Md-Mizanur Rahoman, [email protected] Ichise, [email protected]


Date post:	09-Jul-2015
Category:	Engineering
Upload:	national-inistitute-of-informatics-nii-tokyo-japann
View:	189 times
Download:	9 times

inteSearch: An Intelligent Linked Data Information Access Framework

Engineering