Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | osborne-lindsey |
View: | 214 times |
Download: | 0 times |
Invited Talk University of Athens
October 21, 2008
Towards DataData Mashups and Pipes
Dr. Mustafa [email protected]
HPCLab, University of Cyprus
MashQLMashQL
Reading: Mustafa Jarrar and Marios D. Dikaiakos: MashQL: A Query-by-Diagram Topping SPARQL -Towards Semantic Data Mashups. In ONISW’08 workshop, part of the CiKM'08 confernce, ACM. 2008 http://www.jarrar.info/publications/JD08.pdf
Imagine Imagine We are in 3008.We are in 3008.
The internet is a databaseInformation about every little thingInformation about every little thing
Structured,granular data
Semantics, linked data
How we will yahoo/google this knowledge !!? (oracle?)
• The Data Web and the role of Mashups
• Mashup Challenges
• MashQL (A new Mashup Language)
• Conclusions and Discussion
Outline
Jarrar-University of Cyprus
Web 2.0 and the phenomena of APIs
Moving to the Data Web, in parallel to the web of documents.
Jarrar-University of Cyprus
An application that combines data from multiple sources (APIs).
Mashups
AthensTruism Portal
SOASOA
Jarrar-University of Cyprus
An application that combines data from multiple sources (APIs).
Mashups
(API1 + API2) + API3 = money
(A puzzle of APIs)
Jarrar-University of Cyprus
Mashups (Example)
A unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health
Google NewsProMED
World Health Organization
How can I build a mashup?
What do you want to do?
Which data you need? APIs/RSS available? How is your programming skills?
Start coding
Use mashup editors
Start Configuring
Semi-Technical SkillsGeek
Microsoft Popfly Yahoo! Pipes QEDWiki by IBM Google Mashup Editor (Coming) Serena Business Mashups Dapper JackBe Presto Wires
Sign up for a developer tokenhttp://aws.amazon.com/http://www.google.com/apis/maps/http://api.search.yahoo.com/webservices/re
Mashup Editors
Limitations• Focus only on providing encapsulated access to (some)
public APIs and feeds (rather than querying data sources).
• Still require programming skills.
• Cannot play the role of a general-purpose data retrieval, as mashups are sophisticated applications.
• Lacks a formal framework for pipelining mashups.
Vision and Challenges
Instead of accessing a method in an API in a programmatic style, can these APIs act as query end-points over http (i.e. a URL is a query).
Regard the internet as a database, where a data source is seen as a table, and a mashup is a query.
A Mashup can be a simple inquiry (e.g., Hacker’s articles after 2000).
In short, allow (casual users) to search and consume the Data Web intuitively, like we use search engines (or at least the “advance search” in search engines).
But the problem then is: users need to know the schema and technical details of the data sources they want to query.Jarrar-University of Cyprus
How a user can query a source without knowing its schema, structure, and vocabulary?
SELECT S.Title FROM Google.Scholar SWhere (S.Author=‘Hacker’) UnionSELECT P.PattentTitle FROM Ggoogle.Patent PWhere (P.Inventor =‘Hacker’)UnionSELECT A.Title FROM Citeseer AWhere (P.Author =‘Hacker’)
DateSources
Vision and Challenges
Jarrar-University of Cyprus
How a user can query a source without knowing its schema, structure, and vocabulary?
SELECT S.Title FROM Google.Scholar SWhere (S.Author=‘Hacker’) UnionSELECT P.PattentTitle FROM Ggoogle.Patent PWhere (P.Inventor =‘Hacker’)UnionSELECT A.Title FROM Citeseer AWhere (P.Author =‘Hacker’)
DateSources
Vision and Challenges
Jarrar-University of Cyprus
Vision and Challenges
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
PREFIX S1: <http://site1.com/rdf>PREFIX S2: <http://site1.com/rdf>SELECT ? ArticleTitleFROM <http://site1.com/rdf>FROM <http://site2.com/rdf>WHERE { {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”)
FILTER (?X2 > 2000)}
Some data sources may come without a schema at all, as:
Hacker’s articles after 2000
Programmers usually explore such sources by eyes, and remember the vocabulary and structure…!! (Casual users?)
MashQL
A simple query language for the Data Web, in a mashup style.
MashQL allows querying a dataspace(s) without any prior knowledge about its schema, vocabulary or technical details (a source may not have a schema al all). Explore unknown graph
Does not assume any knowledge about RDF, SPARQL, XML, or any technology, to get started.
Users only use drop-lists to formulate queries. (query-by-diagram/interaction).
Jarrar-University of Cyprus
MashQL Example 1
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
Hacker’s Articles after 2000?
MashQL
From:
RDF Input
http://www.site1.com/rdf
Everything
Title ArticleTitle
Author “^Hacker”
Year\PubYear > 2000
http://www.site2.com/rdf
Jarrar-University of Cyprus
MashQL Example 1
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
Hacker’s Articles after 2000?
MashQL
From:
RDF Input
http://www.site1.com/rdf
http://www.site2.com/rdf
Everything
InstancesTypes
a1a245
Everything
Interactive query formulation
Jarrar-University of Cyprus
MashQL Example 1
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
Hacker’s Articles after 2000?
MashQL
From:
RDF Input
http://www.site1.com/rdf
http://www.site2.com/rdf
Everything
Title ArticleTitle
AuthorCitesPublisherPubYearTitleYear
Jarrar-University of Cyprus
MashQL Example 1
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
Hacker’s Articles after 2000?
MashQL
From:
RDF Input
http://www.site1.com/rdf
http://www.site2.com/rdf
Everything
Title Article title
Author Con
EqualsContainsOneOfNotBetweenLessThanMoreThan
Hacker
AuthorCitesPublisherPubYearTitleYear Jarrar-University of Cyprus
MashQL Example 1
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
Hacker’s Articles after 2000?
MashQL
From:
RDF Input
http://www.site1.com/rdf
http://www.site2.com/rdf
Everything
Title Article title
Author “^Hacker”
Year mor
OneOfNotBetweenLessThanMoreThan
2000\PubYePublisherPubYearTitleYear
Jarrar-University of Cyprus
MashQL Example 1
<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>
http:www.site1.com/rdf
<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”
http:www.site2.com/rdf
Hacker’s Articles after 2000?
MashQL
From:
RDF Input
http://www.site1.com/rdf
http://www.site2.com/rdf
Everything
Title Article title
Author “^Hacker”
Year/PubYear > 2000
PREFIX S1: <http://site1.com/rdf>PREFIX S2: <http://site1.com/rdf>SELECT ? ArticleTitleFROM <http://site1.com/rdf>FROM <http://site2.com/rdf>WHERE { {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”)
FILTER (?X2 > 2000)}
Retrieve every Article that has a title, written by an author, who has an address, this address has a country called Cyprus, and the article published after 2008.
MashQL Example 2
The recent articles from Cyprus
MashQL
Article
Title ArticleTitle
Author Address
Country “Cyprus”
Year > 2008
URL:
RDF Input
http://www4.wiwiss.fu-berlin.de/dblp/
Jarrar-University of Cyprus
The Intuition of MashQL
A query is a tree
• The root is called the query subject.
• Each branch is a restriction.
• Branches can be expanded, (information path)
• Object value filters
Def. A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn.
Dif. A Subject S (I V), where I is an identifier and V is a variable.
Dif. A Restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P I V), and Of is an object filter.
MashQL
Article
Title ArticleTitle
Author Address
Country “Cyprus”
Year > 2008
URL:
RDF Input
http://www4.wiwiss.fu-berlin.de/dblp/
Article
Year ?X2 < 2008
Country?X111 = “Cyprus”
Address ?X11
Author ?X1
Title ?ArticleTitle
The Intuition of MashQL
MashQL
Article
Title ArticleTitle
Author Address
Country “Cyprus”
Year > 2008
URL:
RDF Input
http://www4.wiwiss.fu-berlin.de/dblp/
An Object filter is one of :• Equals• Contains• MoreThan • LessThan• Between• one of• Not(f)• Information Path (sub query)
Def. An object filter Of = <O, f>, where O is an object and f is a filtering function one of :Of = <O>, where O is an object, O V I. Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and L t is a language tag.Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and L t is a language.Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier. Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. Of = <O, Not(f)>, where f is one of the functions defined above. Of = <O, Qi(O)>, where O is an object (O V I), and Qi(O) is a sub-query with O being the query subject. Jarrar-University of Cyprus
More MashQL Constructs
Resection Operators {Required, Maybe, or Without}
All restriction are required (i.e. AND), unless they are prefixed with
“maybe” or “without”
SELECT ?PersonName, ?UniversityWHERE { ?Person :Name ?PersonName. ?Person :WorkFor :Yahoo. OPTIONAL{?Person :StudyAt ?University} OPTIONAL{?Person :Salary ?X1} FILTER (!Bound(?X1))} }
Jarrar-University of Cyprus
More MashQL Constructs
Union operator (denoted as “\”) between Objects, Predicates, Subjects
and Queries SELECT ?PersonWHERE { ?Person :WorkFor :Google UNION ?Person WorkFor :Yahoo}
SELECT ?FNameWHERE { ?Person :Surname ?FName UNION ?Person :Firstname ?FName}
SELECT ?AgentName, ?AgentPhoneWHERE { {?Person rdf:type :Person. ?Person :Name ?AgentName. ?Person :Phone ?AgentPhone}UNION {?Company rdf:type :Company. ?Company :Name ?AgentName. ?Company :Phone ?AgentPhone}}
SELECT ?CustName,WHERE { ?Person :Name ?CustName. UNION {?Company :Title ?CustName. ?Company :City ?X1. FILTER regex(?X1, “Paris”)}}
More MashQL Constructs
And several other constructs, including: Types and Reverse Predicates Datatypes and Language Tags ….
Jarrar-University of Cyprus
Formal Syntax and Semantics
Def.1 (Dataset): A dataset D is a set of triples, each triple t is formed as <S, P, O>, where S I, P I, and O I L.Def.2 (Typed Literals): Every object literal must have a datatype D: If O L then O D.Def.3 (Language Tags): An object literal (O L) may have a language tag Lt.Def. 4 (Query): A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn.Def. 5 (Subject): A subject S (I V), where I is an identifier and V is a variable.Def. 6 (Restriction): A restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P I V), and Of is an object.Def.7 (Object Filter): An object filter Of = <O, f>, where O is an object and f is a filtering function. An object filter can have one of the following nine forms:1.Of = <O>, where O is an object, O V I. This is the simplest object filter, i.e., it does not add any restriction on the object value of the retrieved triples. 2.Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and Lt is a language tag. This filter restricts the retrieved results, such that, the object value O should be equal to X, with datatype T, and with language Lt.3.Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language. This filter restricts the retrieved results, such that, the object value O should be equal to regex(X), with datatype T, and with language Lt. A regex literal is a literal that contains a regular expression matching pattern. 4.Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. This filter restricts the retrieved results, such that, the object value O should be more than X and with datatype T. 5.Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier. This filter restricts the retrieved results, such that, the object value O should be less than X and with datatype T (see rule-9).6.Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. This filter restricts the retrieved results, such that, the object value O should be more than or equals X, less than or equals Y, and with datatype T.7.Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. This filter restricts the retrieved results, such that, the object value O should be equal to one of the values in V. 8.Of = <O, Not(f)>, where f is one of the functions defined above. This filter extends all of the above functions with simple negation. The filter is same as the Equals filter but with negation, i.e., Not Equal. 9.Of = <O, Qi(O)>, where O is an object (O V I), and Qi(O) is a sub-query with O being the query subject. The restrictions defined in the sub-query Qi(O) should be satisfied as well. Notice that this definition is recursive; however, this does not mean the query itself is recursive.Def.8 (Types): A subject (S I) or an object (O I) can be prefixed with “a” or “an” to mean the instances of this subject/object type, instead of the subject/object itself.Def.9 (Union): A union can be declared between objects, predicates, subjects and/or queries, in the following forms:1.On = <O1\O2 \ . . . \On>, to indicate unions between objects, where Oi I. 2.Pn = <P1\P2 \ . . . \Pn>, to indicate unions between predicates, where Pi I.3.Sn = <S1\S2 \ . . . \Sn>, to indicate unions between subjects, where Si I.4.Qn = <Q1\Q2 \ . . . \Qn>, to indicate unions between queries.Def.10 (Reverse): <~P> indicates the reverse of the predicate P. Let R1 be a restriction on S such that <S P O>, and R2 be <O ~P S>, R1 and R2 have the same meaning. Jarrar-University of Cyprus
MashQL Queries
In the background, MashQL queries are translated into and executed as SPARQL queries.
At the moment, we focus on RDF (/RDFa) as a data format, and SPARQL (/Oracle’s SPARQL) as a backend query language. However, MashQL can be easily mappable to other query languages.
MashQL is not merely a user interface, by also a query language with its intuition (it focuses on path pattern, rather than triple pattern).
Jarrar-University of Cyprus
Rule-1: The symbol before a variable means that it will be returned in the results; i.e., included in the SELECT part of in SPARQL. If the output of the query is input to another, use “CONSTRUCT *”.Rule-2: In any of the following rules, if a subject, predicate, or object is italicized: it is seen as a SPARQL variable, i.e. prefixed with “?”.Rule-3: If S is a subject and R = < , P, Of>, the mapping is: {S P O}.Rule-4: If S is a subject and R = <maybe, P, Of>, the mapping is: {OPTIONAL{S P O}}.Rule-5: If S is a subject and R = < without, P, Of>, the mapping is: {S P O. FILTER (!bound(?O))}. Rule 6. If Of = <O, Equals(X, T, Lt)>: Append the mapping with: FILTER(?O = X) If T Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt Null: Append the mapping with: FILTER(lang(?O) = Lt)Rule 7. If Of = Contains(X, T, Lt)>: Append the mapping with: FILTER regex(?O, X) If T Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt Null: Append the mapping with: FILTER(lang(?O) = Lt)Rule 8. If Of = <O, MoreThan(X, T)>: Append the mapping with: FILTER(?O > X) If T Null: Append the mapping with: FILTER(datatype(?O=T)Rule 9. If Of = <O, LessThan(X, T)>: Append the mapping with: FILTER(?O < X) If T Null: Append the mapping with: FILTER(datatype(?O=T)Rule 10. If Of = <O, Between(X, Y, T)>: Append the mapping with: FILTER(?O >=X)&& FILTER(?O<=Y) If T Null: Append the mapping with: FILTER(datatype(?O)=T)Rule 11. If Of = <O, OneOf (V)>: Append the mapping with: {FILTER(?O = V1)|| . . . || FILTER(?O = Vn)} If Vi is a regex-ed literal, the ith filter above should be replaced with: FILTER Regex(?O, Vi)Rule 12. If Of = <O, Not(f)>: The f filter will be generated as above, but with a negation.Rule 13. If Of = <O, Qi(O)>: Repeat all mapping rules to generate Qi(O).Rule 14. If a subject S is prefixed with “a” or “an”: Append the mapping with: {?S rdf:type :S}Rule 15. If an object O is prefixed with “a” or “an”: Append the mapping with: {?O rdf:type :O}Rule 16. Given On , If n >1 and Oi I : The mapping in rules 3-4 will be:{{S P :O1} UNION . . . UNION {S P :On}}Rule 17. Given Pn , If n >1 and Pi I : The mapping in rules 3-4 will be: {{S :P1 O} UNION . . . UNION {S :Pn O}}Rule 18. Given Sn , If n >1 and Si I : Regenerate the query n times, each time with Si as a root, and with a UNION between the queries.Rule 19. Given Qn , If n >1 : Add UNION between the n queries.Rule 20. If S is a subject and R = <~P, O>, the mapping is: {O P S}.
MashQL-SPARQL Mapping Rules
Also mapped into SQL and Oracle’s SPARQL
Jarrar-University of Cyprus
MashQL Markup: an XML Schema to represent pipes in XML.
The reference grammar (Technical specification).
MashQL Compilation
Jarrar-University of Cyprus
MashQL Compilation
Depending on the pipeline structure, MashQL generates either SELECT or CONSTRUCT queries:
• SELECT returns the results in a tabular form (e.g. ArticleTitle, Author)
• CONSTRUCT returns the results in a triple form (e.g. Subject, Predicate, Object). …
CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}
… CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}
…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}
…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}
Jarrar-University of Cyprus
System Model (Online Mashup Editor)
Download(http)
Site1
Site2
Site3
QueryLoader
Client
ResultsRender
Bulk-load
B.Query(AJAX)
RunQuery(http)
DataSources(AJAX)
Results(http)
(Wikipedia Titles, 28 MB zip, 316 MB nt, 2.7 M triples): Download (37 s, 600KB/s) Bulk-Load Oracle-RDF (70 Sec, 40K triples per Sec). Query (one/few Sec.)
Mashup Server
Jarrar-University of Cyprus
The output of a mashup can be an input to another. (Enabling people to collaborate and innovate, build of each others’ results)
Use Case: Job Seeking
A mashup of job vacancies based on Google Base and on Jobs.ac.uk.
…CONSTRUCT *WHERE { {{?Job :Category :Health}UNION {?Job :Category :Medicine}} ?Job :Role ?X1. ?Job :Salary ?X2. ?X2 :Currency :UPK. ?X2 :Minimun ?X3. FILTER(?X1=“Research” || ?X1=”Academic”) FILTER (?X3 > 50000) }
…CONSTRUCT *WHERE { {{?Job :Category :Health}UNION {?Job :Category :Medicine}} ?Job :Role ?X1. ?Job :Salary ?X2. ?X2 :Currency :UPK. ?X2 :Minimun ?X3. FILTER(?X1=“Research” || ?X1=”Academic”) FILTER (?X3 > 50000) } …
CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}
… CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}
…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}
…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}
Jarrar-University of Cyprus
Use Case: My Citations
A mashup of cited Hacker’s articles (but no self citations), over Scholar
and Siteseer
Jarrar-University of Cyprus
Use Case: eHealth Research
A mashup based an eHealth database to find what cases Prostate Cancer
Add/remove restrictions until you retrieve all and only the people with prostate cancer,
(the restrictions the symptoms )
Jarrar-University of Cyprus
Use Case: Retailers
A Retailer mashup of three RDF data sources with a user-input of
some barcode numbers.
When scanning a product, retrieve its English and French titles directly from the manufacturer online catalog.
Jarrar-University of Cyprus
Use Case: Car Rental business Auditing
A government connects to the databases of car rental companies to
audit whether they are in compliance to the local regulations.
(Each query is a business rule, if the results not empty, valuation)
Vehicles were rented without being insured.
Rentals to people without licenses
Rentals to people without proper licenses
Jarrar-University of Cyprus
Evaluation
First: Query Execution :
•The performance of executing a MashQL query is bounded to the
performance to executing its backend language (i.e. SPARQL/SQL).
•A query with medium size complexity takes one or few seconds
(Oracle’s SPARQL, [Chong et al 2007]).
Jarrar-University of Cyprus
Evaluation
Second: Background Queries:
•These are the queries that the MashQL editor performs in the
background (to generate drop-down lists), while a user formulate
his/her query.
•Executing background queries should be fast enough to allow
efficient query formulation.
•Experiments over:•DBLP data (12 million triples, 700 MB )•DBPedia data (25 Million triples , 2.x GB)
Jarrar-University of Cyprus
Evaluation
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Article
Title ArticleTitle
Creator
Name “^Berners-Lee^”
Year > 1993
Jarrar-University of Cyprus
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Evaluation of the Background Queries
[00.00]Article
InstancesTypes
ArticleBookIncollectionInproceedingsMasterthesisPersonPhdthesisProceedingswww
EverythingSelect O FROM …(?S <rdf:type> ?O)… Group by O Order by O;
Jarrar-University of Cyprus
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Evaluation of the Background Queries
[00.03]
BookTitleCDRomCiteCreatorDateEditorJournalMonthNumberPagesPublisherTitleVolumeYear
Title
Article [00.00]
ArticleTitleSelect P FROM …(?S <rdf:type> ?O)(?O ?P ?O1)… Group by P Order by P;
Jarrar-University of Cyprus
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Evaluation of the Background Queries
[00.03]
BookTitleCDRomCiteCreatorDateEditorJournalMonthNumberPagesPublisherTitleVolumeYear
Creator
Article [00.00]
Title [00.03] ArticleTitle
Select P FROM …(?S <rdf:type> ?O)(?O ?P ?O1)… Group by P Order by P;
Jarrar-University of Cyprus
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Evaluation of the Background Queries
[00.43]
NameType
NameEqualsContainsOneOfNotBetweenLessThanMoreThan
Cont Berners-Lee
Article [00.00]
Title [00.03] ArticleTitle
Creator [00.03]
Select P FROM …(?S <rdf:type> ?O)(?O <:Creator> ?O1)(?O1 ?P ?O2)… Group by P Order by P;
Jarrar-University of Cyprus
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Evaluation of the Background Queries
[00.03]
YearEqualsContainsOneOfNotBetweenLessThanMoreThan
More 1994
BookTitleCDRomCiteCreatorDateEditorJournalMonthNumberPagesPublisherTitleVolumeYear
Article [00.00]
Title [00.03] ArticleTitle
Creator [00.03]
Name [00.43] “^Berners-Lee^”
Select P FROM …(?S <rdf:type> ?O)(?O ?P ?O1)… Group by P Order by P;
Jarrar-University of Cyprus
MashQL
From:
RDF Input
http://www.informatik.uni-trier.de/~ley 12 Million Triples
Evaluation of the Background Queries
Article [00.00]
Title [00.03] ArticleTitle
Creator [00.03]
Name [00.43] “^Berners-Lee^”
Year [00.03] > 1993
Jarrar-University of Cyprus
Evaluation of the Background Queries
B.Query 12 M triples 6 M triples 3 M triples 1.5 Million
Q1 <00.00 <00.00 <00.00 <00.00
Q2 <00.03 <00.01 <00.01 <00.00
Q3 <00.03 <00.01 <00.01 <00.00
Q4 <00.43 <00.20 <00.13 00.08
Q5 <00.03 <00.01 <00.01 <00.00
Summary
Our goal is not to benchmark whether Oracle is fast and scalable, but to if know
Oracle’s speed is sufficient for MashQL interactivity ? Yes. Yes.
Jarrar-University of Cyprus
Conclusions
• A formal but yet simple query language for the Data Web, in a mashup and declarative style.
• Allows people to discover and navigate unknown data spaces(/graphs) without prior knowledge about the schema or technical details.
• Can be use as a general purpose data retrieval and filtering (rather than only sophisticated Mashups).
• Query Cursors: to cache history information paths.
• Formal framework for query pipelines: caching, materialization.
• Query distribution and scheduling. Jarrar-University of Cyprus